The assignment 9 is to create hash tables and constructing the hash functions.
The Different parts of the program are:
-
We need to construct the hash tables. We need to take into account the collosions factor. If there are colloions in the table, we need handle
them. As we know there are two methods for handling the collosions: the linear probing and the quadratic probing.
- In linear probing, if there is a collosion, the hashval is incremented by one and then this position is checked for its
validity. It is certain that we will find an empty position before iterating over the entire table.
- In quadratic probing, there is a variable say 'i'. If there is a collosion, then i*i is added to the hashval and this
is checked for the validity. 'i' goes from 1,2,3,....untill an empty posittion is found out. At the first collosion, the
i is 1, in the next collosion i is 2 and so on.
- We have to implement both the probing.
-
One important way for hashing the value is the seperate chaining method. In this method, element is stored in a data structure or a list.
There is an array of lists. Each index of this array is a link to other linked lists. Now the hash value is calculated. If the hashvalue, is to
collide with an existing filled position, the element is inserted at that position.As we know, the array has links to other linked lists
and so if different keys happen to have the same hashval, then the element is simple inserted in the linked list of that index.
-
We have to create three different types of hash functions:
- A bad-hash function which will result in many collosions. I simply added the ascii values of all the characters. Thus, different
permutations of the same word will have the same hashval and thus there are many collosions.
- A mediocre hash function which is better than the above hash function. I XOR the characters of the word. This results in fewer
collosions.
- A good hash function. I have used the prime number - 37. Since we know the prime number appear to work better when creating the
hash function, I used one. This results in fewer collosions and thus it is better.
-
One application of hash tables is: Given a text file, calculate the 25 most used words. How to do.?
Hash all the words in a text file on to a hash table. After hashing the words, I know how many times each word has occurred. But I need to
calculate the 25 most commonly used words. For this, I need to use the Min Heap concept. I am using a 25-word heap. First I insert the
first 25 Key-count pair into the heap. THen I insert the remaining pairs. If the new pair count is less then the root pair count, then the
new pair is also smaller than the rest of them. But if the new pair is greater than the root, then the root should be discarded and the new pair
should be inserted at its correct position.
- A written Assignment
- Comments on the class and the functions and its various uses.
- The elegance of programming
- Creating a webpage for the assignment. I have created one.
- We have to undergo various experiments and analysis as to which type of probing is faster and on various other criteria.
I have downloaded four books in the for of text files and ran the program on them. I have tested the program on different
criteria and come up with the following conclusions:
Are the results the same as I had expected?
- The results are partially the same as I had expected but not entirely.
- The seperate chaining performs the best in all three cases as I had expected.
- The quadratic probing performs mediocre when using good or okay hash function. It performs the worst in the good hash function. This is
surprising as I expect that the quadratic probing runs mediocre in all three cases.
- The linear probing performs the worst when using the bad or okay hash function as expected. But performs much better than quadratic probing
when using a good hash function.
The output of my program is:
These are the output of running a text file which is a book.
Thank You.!