Suppose you are asked to write a program to count the frequency of occurrence of
ID: 3806425 • Letter: S
Question
Suppose you are asked to write a program to count the frequency of occurrence of each word in a document. Describe how you would implement your program using: 1. A hash table to store words and their frequencies. 2. An AVL tree to store words and their frequencies. For each of these implementations: 1. What would be the worst case time to populate the data structure with all the words and their frequencies? 2. What would be the worst case time to look up the frequency of a word? 3. What would be the worst case time to print all words and their frequencies, in alphabetical order of the words? Assume there are n distinct words in the document, and a total of m words, and m is much greater than n.Explanation / Answer
1.Using Hash table
Algo-We will keep on iterating the document word by word and for each word we can use our hash function to compute hash for that word and then simply store that in our hash table.For every new word read we can put the word count as 1 in table on index corresponding to that word, and when next time that word comes we will increase that wount.Here the hash table indexing will be made on the basis of hash value that is computed , say in JAVA
we can simply use a hash map ;
HashMap<String, Integer> map = new HashMap<String,Integer>();
Now we can simply put values in the map where your key will be the word and the value will be the count of the occurance of that word.
Your function will be somewhat like this
a)Worst case time to populate the data structure with all the words and their frequencies will be O(n) as we have to iterate over the Hash table which can have maximum entries as m (but we have n distinct words) and simply print the word with the frequency count.Note that frequency of a word can be simply computed using map.get() function in O(1).
b.) Worst case time to look up for a frequency if a word will be O(1) as we can refer to index of hash table using the key i.e. word and we can fetch the frequency count in O(1) using get() function.This will address to the hash table using index computed from hash function.
c.)For maintaining alphabetically order we need to sort the key set and that can be done in O(nlogn), hence worst case complexity for this will be O(nlogn).Also in Java we can use tree map that itself maintains the sorting order on its own.
2.Using AVL tree
Algo- Create an AVL tree. This AVL tree will store nodes, where each node is a (key,value) pair. The key field is a word, and the value field is the frequency of that word the input file. Iterate over input file, word by word. For each newly read word, you first find if it is already in tree. If it is not in tree, you insert it in tree with the frequency as 1. If a word is already in tree, you increase its frequency count by 1, since you have encountered another occurrence of this word. Thus, you will have an AVL tree which holds all words in the file with their computed frequency counts.
a.)The computational complexity for this part is O(m logm), where m is the number of words in the input file.
b.)In order to locate the word we have to look for it in the tree and the worst case time complexity for the same will be O(logn).
c.)If we do an inorder traversal on tree, we get all the words in alphabetical order.Thus worst time complexity will be O(n) for this case.