Heres the file to use for assignment: public class Bigram { // TODO: add member
ID: 3841027 • Letter: H
Question
Heres the file to use for assignment:
public class Bigram {
// TODO: add member fields! You may have more than one.
// You will probably want to use some kind of Map!
/**
* Create a new bigram model based on the text given as a String argument.
* See the assignment for more details (and also check out the Wikipedia
* article on bigrams).
*
* @param s
* text
*/
public Bigram(String s) {
// TODO: implement me!
}
/**
* Check to see whether the sentence is possible according to the bigram
* model. A sentence is possible if each bigram in the sentence was seen in
* the text that was passed to the constructor.
*
* @param s
* Sentence
* @return true if possible, false if not possible (some transition does not
* exist in the model as constructed)
*/
public boolean check(String s) {
// TODO: implement me!
return false; // Fix this!
}
/**
* Generate an array of strings based on the model, start word, and count.
* You are given the start word to begin with. Each successive word should
* be generated as the most likely or common word after the preceding word
* according to the bigram model derived from the text passed to the
* constructor. If more than one word is most likely, pick the smallest one
* according to the natural String comparison order (compareTo order). Fewer
* than count words may be generated if a dead end is reached with no
* possibilities. If the start word never appears in the input text, only
* that word will be generated.
*
* @param start
* Start word
* @param count
* Number of words to generate (you may assume it's at least 1)
* @return Array of generated words which begins with the start word and
* will usually have the length of the count argument (less if there
* is a dead end)
*/
public String[] generate(String start, int count) {
// TODO: implement me!
return null; // Fix this! Your method should never return null!
}
}
Heres the assignment:
Create a class called Bigram. The class will have a constructor which takes a String. Use a Scanner with its default tokenization on the String. As long as hasNext() returns true, each call to next() will retrieve the next word. Note that some words will be capitalized differently or contain punctuation. Treat each of those differently (for example, "Dogs", "dogs", and "dogs." are all different strings). Checking a phrase will consist of looking at each adjacent pair of adjacent words. If all adjacent pairs were seen in your input text, your code will return true, otherwise false. Example: Bigram x = new Bigram("Bob likes dogs. Bill likes cats. Jane hates dogs."); x.check("Bob likes cats.") returns true: "Bob likes" and "likes cats." both appear in the input text. x.check("Jane likes cats.") returns false: "Jane likes" does not appear in the input text. Your phrase generation method will be given a start word and a count indicating the number of total words to generate (including the start word). It will generate the “most likely” or “most common” phrase based on bigram counts. It will return an array of Strings with the words generated in order. It always starts by generating the start word. As you generate each word, the next word generated should be the one that appears most often in the input (constructor) text after the previous word generated. If you reach a dead end (either the previous word was never seen or there are no words ever seen after that word), end generation early and return a shorter array. If there is more than one “most common” choice seen in the input text, pick the one with the smallest word according to the String.compareTo method (NOTE: OrderedSets and OrderedMaps such as TreeSets and TreeMaps order their set (or set of keys) according to compareTo.)
Will rate if it is accurate! Thanks!
Explanation / Answer
Implementation of the given class (explanation in comments):
import java.util.*;
public class Bigram {
private Map<String,Map<String,Integer>> bigram_map_with_counts;
public Bigram(String s) {
// key of the outer map represents the first word in the bigram
// key of the outer map represents the second word in the bigram
// and the Integer value represents frequency of the bigram
bigram_map_with_counts = new HashMap<String,Map<String,Integer>>();
//pre-process the input, no need to store the original string
Scanner scanner = new Scanner(s);
String cur, prev = null;
while(scanner.hasNext()){
cur = scanner.next();
if(prev != null) {
if(bigram_map_with_counts.get(prev) == null) {
bigram_map_with_counts.put(prev, new TreeMap<String,Integer>());
}
if(bigram_map_with_counts.get(prev).get(cur) == null) {
bigram_map_with_counts.get(prev).put(cur,1);
} else {
int count = bigram_map_with_counts.get(prev).get(cur);
bigram_map_with_counts.get(prev).put(cur,count+1);
}
}
prev = cur;
}
}
/**
* Check to see whether the sentence is possible according to the bigram
* model. A sentence is possible if each bigram in the sentence was seen in
* the text that was passed to the constructor.
*
* @param s
* Sentence
* @return true if possible, false if not possible (some transition does not
* exist in the model as constructed)
*/
public boolean check(String s) {
Scanner scanner = new Scanner(s);
boolean possible = true;
String cur, prev = null;
while(scanner.hasNext()){
cur = scanner.next();
if(prev != null) {
// no bigram starts with 'prev'
if(bigram_map_with_counts.get(prev) == null) {
possible = false;
break;
}
// the bigram 'prev next' does not exist
if(bigram_map_with_counts.get(prev).get(cur) == null) {
possible = false;
break;
}
}
prev = cur;
}
return possible;
}
/**
* Generate an array of strings based on the model, start word, and count.
* You are given the start word to begin with. Each successive word should
* be generated as the most likely or common word after the preceding word
* according to the bigram model derived from the text passed to the
* constructor. If more than one word is most likely, pick the smallest one
* according to the natural String comparison order (compareTo order). Fewer
* than count words may be generated if a dead end is reached with no
* possibilities. If the start word never appears in the input text, only
* that word will be generated.
*
* @param start
* Start word
* @param count
* Number of words to generate (you may assume it's at least 1)
* @return Array of generated words which begins with the start word and
* will usually have the length of the count argument (less if there
* is a dead end)
*/
public String[] generate(String start, int count) {
String[] generated_strings = new String[count];
// initialize the first generated string to start, as per requirement
generated_strings[0] = start;
int i = 1;
while(i < count) {
if(bigram_map_with_counts.get(generated_strings[i-1]) == null) {
// no words follow the previously generated word so stop
break;
}
// get all the set of strings that follow the previosly generated string in the text
Set<String> ordered_key_set = bigram_map_with_counts.get(generated_strings[i-1]).keySet();
// the map will return frequencies in ascending order, so we need to reverse it.
List<String> ordered_list = new ArrayList<String>(ordered_key_set);
Collections.reverse(ordered_list);
// now the list is in descending order of frequency
int max_freq = 0;
String min_word_with_max_frequency = null;
for(String str : ordered_list) {
// max frequency will appear first in the list
if(max_freq == 0) {
min_word_with_max_frequency = str;
max_freq = bigram_map_with_counts.get(generated_strings[i-1]).get(str);
} else if (bigram_map_with_counts.get(generated_strings[i-1]).get(str)==max_freq) {
// we have found another string following the previously generated string with the same frequency
if(str.compareTo(min_word_with_max_frequency) < 0) {
// and it is also smaller as compared to the minimum word we have found earlier
min_word_with_max_frequency = str;
}
} else {
// no point in going through ahead in the loop as all of them will have lower frequency than what we have earlier
break;
}
}
generated_strings[i++] = min_word_with_max_frequency;
}
return generated_strings;
}
}
----------------------------------
Here's a class that tests the above:
public class Test{
public static void main(String []args){
Bigram b = new Bigram("Bob likes cats. Bob likes dogs. Bob hates Jane. Jane hates cats. Jane likes dogs. Bill hates both. Jack likes cats.");
// must return true as 'Bob likes' and 'likes cats' both the bigrams are a part of the string
System.out.println(b.check("Bob likes dogs."));
// must return true as 'Jane likes' and 'likes cats' both the bigrams are a part of the string
System.out.println(b.check("Jane likes cats."));
// must return false as 'Bill likes'
System.out.println(b.check("Bill likes cats."));
// 'Bob' is most commonly followed by 'likes' (twice)
// 'likes' is most commonly followed by 'dogs.' and 'cats.' both (twice) but 'cats.' is a smaller string than 'dogs.' by java comparesTo() method
// hence the output of the following must be [ Bob likes cats. ]
System.out.print("[ ");
for(String str: b.generate("Bob", 3)) {
System.out.print(str + " ");
}
System.out.print("] ");
}
}
OUTPUT of the Test class:
true
true
false
[ Bob likes cats. ]