Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The following code counts how often different words occur in a text file. char t

ID: 3719469 • Letter: T

Question

The following code counts how often different words occur in a text file.

char toLC (char c) {

return (c >= 'A' && c <= 'Z') ? c-'A'+'a' : c;

}

void countWordsInFile (istream& inFile, map& counts) {

string word;

while (inFile >> word) { // strip away any non-alphabetic characters

string::size_type n = word.find_first_not_of (wordCharacters);

if (n != string::npos) {

word = word.substr (0, n);

if (word.length() > 0) {

// if there's anything left, count it

word = transform(word.begin(), word.end(), word.begin(), toLC);

/** ... increment the appropriate counter in counts ... **/

map::iterator pos = counts.find (word);

if (pos == counts.end()) { // if this is the 1st time we've seen // this word

counts.insert (map::value_type(word, 1)); }

else { // else if we've seen this word before {

++((*pos).second); }

}

}

} Let W denote the number of words in being processed. Many of these words, in a typical document, will be duplicates. Let D denote the number of distinct (non-duplicate) words, D <= W. Assume the length of any individual word is bounded by some constant (e.g., no single word is longer than "antidisestablishmentarianism"). What is the average-case complexity for countWordsInFile? (examle O(n), O(logn), O(nlogn), etc) PLEASE SHOW ALL WORK ON DETERMINING COMPLEXITY

Explanation / Answer

First, we are trying to find the number of Strings that only appear exactly once in a file.
Once we read from a file, we are storing the words in two HashSet Objects.

So, there is no need for sorting also. A better approach would be to use two HashSet One for maintaining repeating and one for non-repeating words. Since HashSet internally uses HashMap, Ideally contains, get(), put() method has o(1) complexity. Hence, the overall complexity of this approach would be o(n).