Part A Write a multithread C program to count the frequency of words in a text f
ID: 3767809 • Letter: P
Question
Part A
Write a multithread C program to count the frequency of words in a text file. In the program, the main thread should get the input file name from the command line arguments, open the file, do necessary global variable initialization, and perform necessary error checking during these steps. The main thread then creates a child thread to read each word from the text file. If the word has not appeared before, the child thread should access one global variable to store this word and set its frequency to 1. If the word has appeared before, the child thread should access the global variable to increase its frequency by 1. After the entire file is processed, the main thread then access the global variable and output the word-frequency data on the screen. The output should be one word each line, in the form of "word frequency". And all the words should be output alphabetically. For example, assume the compiled program is named as "a.out":
Your program need not distinguish the upper and lower cases, e.g., "we" and "We" are deemed as the same word. And different forms of a word can be treated differently, e.g., "cat" and "cats" are treated as two different words, as well as "take", "took" and "taken" are treated as three different words. Since both the main thread and the created child thread may access the file and the global variables, you may need to use mutex and/or other mechanisms to avoid the race conditions. The entire txt file may be very large and not able to be held in the main memory.
Part B
Extend the program in the Part A to create multiple children threads. The number of children threads should be input as another command line argument. For example, assume the compiled program is named as "b.out":
Since all the created children threads will access the text file to read words and access one the variable to update word frequency, you may need to use mutex and/or other mechanisms to avoid the race conditions.
Explanation / Answer
program :
#include <stdio.h>
#include <stdlib.h>
#define MAXWORDS 4000
char *word[MAXWORDS];
int wordcount[MAXWORDS];
#define MAXWLEN 30
char buff[MAXWLEN];
int nwords, totalwords;
main()
{
int i;
while(get_word(buff))
{
}
for(i = 0; i < nwords; i++)
totalwords += wordcount[i];
printf("there were %d different words out of %d totalwords ",nwords, totalwords);
}
#include <ctype.h>
int get_word(char *s)
{
int c;
do
{
c = getchar();
if(c == EOF)
return(0);
}
while(!isalpha(c) && !isdigit(c));
do
{
if(isupper(c))
c = tolower(c);
*s++ = c;
c = getchar();
}
while(isalpha(c) || isdigit(c));
*s = 0;
return(1);
}