The code here reads in a file and converts it to a list of words. It strips out
ID: 3735593 • Letter: T
Question
The code here reads in a file and converts it to a list of words. It strips out all punctuation for you. I want you to construct a frequency table for the words, and then print out the 100 most frequent words. Use a dictionary, a.k.a. java.util.Map. You can use TreeMap or HashMap, or if you want, try them both to see if one is faster. If you like, compare your result to the list of the 100 most common words (Links to an external site.).
Here is original program we are given:
package lab.pkg7;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class Lab7 {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
{
ArrayList<String> words = new ArrayList<>();
try
{
Scanner fin = new Scanner(new File("55.txt"));
while (fin.hasNext())
{
String word = fin.next().toLowerCase().replaceAll("[^a-z]", "").trim();
if (word.length()>0)
words.add(word);
}
}
catch (FileNotFoundException e)
{
System.err.println(e);
}
for (int i=0; i<100; i++)
System.out.println(words.get(i));
}
}
}
The input file is:
The Project Gutenberg EBook of The Wonderful Wizard of Oz, by L. Frank Baum
However, any input will work for this. Thanks.
The code here reads in a file and converts it to a list of words. It strips out all punctuation for you. I want you to construct a frequency table for the words, and then print out the 100 most frequent words. Use a dictionary, a.k.a. java.util.Map. You can use TreeMap or HashMap, or if you want, try them both to see if one is faster. If you like, compare your result to the list of the 100 most common words (Links to an external site.).
Here is original program we are given:
Explanation / Answer
Please find the below updated Lab7 class. To test the implementation dummy input file used.
Please check and revert in case anything needs to change.
******* Lab7 Class *******
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class Lab7 {
public static void main(String[] args) {
ArrayList<String> words = new ArrayList<>();
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
try {
Scanner fin = new Scanner(new File("55.txt"));
while (fin.hasNext()) {
String word = fin.next().toLowerCase().replaceAll("[^a-z]", "").trim();
if (word.length() > 0) {
if (wordCountMap.get(word) == null) {
wordCountMap.put(word, 1);
} else {
wordCountMap.put(word, wordCountMap.get(word) + 1);
}
words.add(word);
}
}
} catch (FileNotFoundException e) {
System.err.println(e);
}
Object[] array = wordCountMap.entrySet().toArray();
Arrays.sort(array, new Comparator<Object>() {
public int compare(Object o1, Object o2) {
return ((Map.Entry<String, Integer>) o2).getValue().compareTo(((Map.Entry<String, Integer>) o1).getValue());
}
});
int count = 0;
for (Object e : array) {
count++;
System.out.printf("%-15s%5d ", ((Map.Entry<String, Integer>)e).getKey(), ((Map.Entry<String, Integer>) e).getValue());
if (count == 100) {
break;
}
}
}
}
***** Input 55.txt File ******
HashMap is a part of collection in Java since 1.2.
It provides the basic implementation of Map interface of Java.
It stores the data in (Key,Value) pairs. To access a value you must know its key, otherwise you can’t access it.
HashMap is known as HashMap because it uses a technique Hashing.
Hashing is a technique of converting a large String to small String that represents same String.
A shorter value helps in indexing and faster searches.
HashSet also uses HashMap internally. It internally uses link list to store key-value pairs.
We will know about HashSet in detail in further articles.
HashMap is kept in java.util package. As you can see in above definition of HashMap, it extends an abstract class AbstractMap which also provides an incomplete implementation of Map interface. As you can see it also implements Cloneable and Serializable interface.
K and V in above definition represents for Key and Value respectively.
HashMap don’t allow duplicate keys, but allows duplicate values.
That means A single key can’t contain more than 1 value but more than 1 key can contain a single value.
HashMap allows null key also but only once and multiple null values.
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
It is roughly simillar to HashTable but is unsynchronized.
HashMap provides constant time complexity for basic operations, get and put, if hash function is properly written and it disperses the elements properly among the buckets.
Iteration over HashMap depends on the capacity of HashMap and number of key-value pairs. Basically it is directly proportional to the capacity + size.
Capacity is the number of buckets in HashMap. So it is not a good idea to keep high number of buckets in HashMap initially.
****** Output *******
hashmap 13
it 12
of 11
in 11
is 10
the 10
a 9
and 8
to 7
value 5
key 5
but 4
as 4
you 4
also 4
pairs 3
number 3
map 3
capacity 3
string 3
keyvalue 3
that 3
can 3
buckets 3
provides 3
interface 3
uses 3
technique 2
for 2
time 2
over 2
duplicate 2
not 2
cant 2
will 2
hashset 2
hashing 2
null 2
access 2
represents 2
single 2
internally 2
values 2
more 2
basic 2
class 2
properly 2
contain 2
than 2
order 2
allows 2
above 2
constant 2
definition 2
know 2
see 2
implementation 2
an 2
java 2
store 1
no 1
indexing 1
otherwise 1
package 1
particular 1
roughly 1
its 1
basically 1
among 1
on 1
guarantees 1
only 1
simillar 1
elements 1
once 1
disperses 1
converting 1
respectively 1
unsynchronized 1
further 1
javautil 1
dont 1
small 1
cloneable 1
shorter 1
we 1
idea 1
put 1
complexity 1
link 1
data 1
collection 1
faster 1
abstractmap 1
written 1
kept 1
articles 1
keys 1
helps 1
k 1