Distributed word counting program in Java Please do not copy & paste off the int
ID: 3823477 • Letter: D
Question
Distributed word counting program in Java
Please do not copy & paste off the internet, as the other people have done.
I’m trying to implement a program to calculate a frequency table for the words that appear in a set of individual text documents using multiple threads in Java. So, for example given a single file, write a method to compute the frequency of each word that appears in that file. This could be done using the space character as the delimiter for tokenising texts and assume all tokens separated by a space character to be a word. Counting the word frequency distribution in all the files shall be carried out by separate threads. Each thread being implemented as a distinct thread (multithreading). Each thread must first get the list of the names of the files to process from the files containing random text.
It shall then proceed in phases, in each phase choosing a file at random from those not yet processed by any worker and then processing that file. The program should implement some method to guarantee that no file is processed twice (For each file, only a single thread should ever process that file, of course, without being to tell in advance which thread that will be). Once a thread has computed the word frequency distribution of a single file, it must write the result to a central data structure (shared between all threads), such as an associative array or a hash table named. However, no two threads must be writing to this array/hash table at the same time. Implement this level of thread synchronisation and mutual exclusion using some locking mechanism.
Also, vary the number of threads from 1 to 100 and measure in each repetition the time to complete the entire task. Plot results in an x y scatter plot, where the x-axis represents the number of threads and the y-axis represents the time taken in milliseconds. Please explain how to work the program.
Thank you, have a pleasant day
Explanation / Answer
Answer: See the code below
1. WordFreqCounter class: WordFreqCounter.java
----------------------------------------------------------
package wordfreqcount;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
/**
* WordFreqCounter class
*
*/
public class WordFreqCounter {
private String textFileName; //text file name
private String[] wordTokens; //words as extracted from text
private int numWordTokens; //number of word tokens
private int numWords; //number of unique words
private HashMap<String, Integer> wordFreqTable; //frequency table of words
/**
* Constructor
*/
public WordFreqCounter(String filename) {
textFileName=filename;
wordTokens=null;
numWordTokens=0;
numWords=0;
wordFreqTable=new HashMap<String, Integer>();
}
/**
* extracts word tokens from text in file
* @throws FileNotFoundException
*/
public void extractWordTokens() throws FileNotFoundException, IOException
{
File file=new File(textFileName);
BufferedReader reader = new BufferedReader(new FileReader(file));
String text=""; //overall text read from file
String line=""; //line read from file
while((line=reader.readLine())!=null)
{
text+=line+" ";
}
//extract word tokens
wordTokens=text.split(" ");
numWordTokens=wordTokens.length;
reader.close();
}
/**
* creates word frequency table
*/
public void createWordFreqTable()
{
int freq=1;
for(int i=0;i<numWordTokens;i++)
{
String word=wordTokens[i];
if(!wordFreqTable.containsKey(word))
{
wordFreqTable.put(word, freq);
}
else
{
wordFreqTable.put(word, wordFreqTable.get(word)+1);
}
}
numWords=wordFreqTable.size();
}
/**
* prints word frequency table
*/
public void printWordFreqTable()
{
System.out.println("Total number of unique words: "+numWords);
System.out.println("Word Frequency");
System.out.println("---- ---------");
for(String word:wordFreqTable.keySet())
{
System.out.println(word+" "+wordFreqTable.get(word));
}
}
/**
* @return the numWords
*/
public int getNumWords() {
return numWords;
}
/**
* @return the wordFreqTable
*/
public HashMap<String, Integer> getWordFreqTable() {
return wordFreqTable;
}
}
-------------------------------------------------
2. WordFreqCounterProcess class: WordFreqCounterProcess.java
--------------------------------------------------
package wordfreqcount;
import java.io.IOException;
/**
* WordFreqCounterProcess
*
*/
public class WordFreqCounterProcess extends Thread{
private String textFileName; //text file name
private WordFreqCounter freqCounter; //word frequency counter
/**
* @param name
*/
public WordFreqCounterProcess(String filename) {
super("Word Frequency Counter Process");
textFileName=filename;
freqCounter=new WordFreqCounter(textFileName);
}
//run method
@Override
public void run() {
synchronized (freqCounter) {
try {
freqCounter.extractWordTokens();
freqCounter.createWordFreqTable();
freqCounter.printWordFreqTable();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
------------------------------------------------
3. WordFreqCounterDemo class: WordFreqCounterDemo.java
------------------------------------------------
package wordfreqcount;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
/**
* WordFreqCounterDemo class
*
*/
public class WordFreqCounterDemo {
/**
* @param args
*/
public static void main(String[] args) {
String filename="files_to_process.txt"; //for demo. replace this with file containing names of files to be processed
File file=new File(filename);
try {
Scanner in=new Scanner(file);
while(in.hasNextLine())
{
String textFileName=in.nextLine();
System.out.println("Processing file "+textFileName);
WordFreqCounterProcess freqCounterProcess = new WordFreqCounterProcess(textFileName);
freqCounterProcess.start();
freqCounterProcess.join();
}
in.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
-------------------------------
4. files_to_process.txt:
---------------------------------
demo_text.txt
demo_text1.txt
--------------------------------
5. demo_text.txt:
--------------------------------------
This is demo text Today is Sunday Traditionally Sunday world over is weekly holiday
---------------------------------------
6. demo_text1.txt:
-------------------------------------
This is demo text Today is Sunday Traditionally Sunday world over is weekly holiday
-------------------------------------