Assignment 9

The assignment 9 is to create hash tables and constructing the hash functions.

The Different parts of the program are:

We need to construct the hash tables. We need to take into account the collosions factor. If there are colloions in the table, we need handle them. As we know there are two methods for handling the collosions: the linear probing and the quadratic probing.
1. In linear probing, if there is a collosion, the hashval is incremented by one and then this position is checked for its validity. It is certain that we will find an empty position before iterating over the entire table.
2. In quadratic probing, there is a variable say 'i'. If there is a collosion, then i*i is added to the hashval and this is checked for the validity. 'i' goes from 1,2,3,....untill an empty posittion is found out. At the first collosion, the i is 1, in the next collosion i is 2 and so on.
3. We have to implement both the probing.

One important way for hashing the value is the seperate chaining method. In this method, element is stored in a data structure or a list. There is an array of lists. Each index of this array is a link to other linked lists. Now the hash value is calculated. If the hashvalue, is to collide with an existing filled position, the element is inserted at that position.As we know, the array has links to other linked lists and so if different keys happen to have the same hashval, then the element is simple inserted in the linked list of that index.

We have to create three different types of hash functions:
1. A bad-hash function which will result in many collosions. I simply added the ascii values of all the characters. Thus, different permutations of the same word will have the same hashval and thus there are many collosions.
2. A mediocre hash function which is better than the above hash function. I XOR the characters of the word. This results in fewer collosions.
3. A good hash function. I have used the prime number - 37. Since we know the prime number appear to work better when creating the hash function, I used one. This results in fewer collosions and thus it is better.

One application of hash tables is: Given a text file, calculate the 25 most used words. How to do.? Hash all the words in a text file on to a hash table. After hashing the words, I know how many times each word has occurred. But I need to calculate the 25 most commonly used words. For this, I need to use the Min Heap concept. I am using a 25-word heap. First I insert the first 25 Key-count pair into the heap. THen I insert the remaining pairs. If the new pair count is less then the root pair count, then the new pair is also smaller than the rest of them. But if the new pair is greater than the root, then the root should be discarded and the new pair should be inserted at its correct position.

A written Assignment

Comments on the class and the functions and its various uses.

The elegance of programming

Creating a webpage for the assignment. I have created one.

We have to undergo various experiments and analysis as to which type of probing is faster and on various other criteria.
I have downloaded four books in the for of text files and ran the program on them. I have tested the program on different criteria and come up with the following conclusions:
- Based on Hash Function:
  I have run the files on different hash functions: bad, good and okay and come with the following conclusions:
  1. The separate hashing performs well in all the cases irrespective of the of the hash functions used.
  2. The quadratic hashing performs worst if it is a good hash function while it performs mediocre in other two hash functions.
  3. The linear probing performs the worst in when using the bad and okay hash functions while performs mediocre in good hash function.
- Based on the collosions Factor:
  The number of collosions by each of the hash function gives an idea of how the mechanisms work:
  1. Again the separate chaining has the the most minimum number of collosions and it performs the best.
  2. The quadratic probing has less collosions than the linear probing when using the okay and bad hash function.
    However the linear probing has less collosions when the good hash functions is used.
  3. The Quadratic probing has the least collosions when a bad or okay hash function is used
  4. The linear probing has the least collosions when an okay or a good hash function is used.
  5. The separate chaining has the least number of collosions when an okay or a good hash function is used.

Are the results the same as I had expected?

The results are partially the same as I had expected but not entirely.
The seperate chaining performs the best in all three cases as I had expected.
The quadratic probing performs mediocre when using good or okay hash function. It performs the worst in the good hash function. This is surprising as I expect that the quadratic probing runs mediocre in all three cases.
The linear probing performs the worst when using the bad or okay hash function as expected. But performs much better than quadratic probing when using a good hash function.

The output of my program is:

These are the output of running a text file which is a book.

Thank You.!

Assignment 9: