Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I have written some code for these but would like to compare with code, to make

ID: 3844246 • Letter: I

Question

I have written some code for these but would like to compare with code, to make sure I have done it correctly. Mainly more emphasise on the h) the tests.

Thank you in advance

Task 1-Analysing n-grams in a sample text (NgramAnalyser) For this task, you will need to complete the NgramAnalyser class, and add code to the ProjectTest class. The NgramAnalyser class analyses an input string, passed to it in the constructor, and counts all the n-grams of letters that occur in the string. An n-gram is simply a (contiguous sequence ofnitems from a piece of text-the items we will be considering for this class are characters. One could also analyse n-grams of words, syllables, or even sentences.) For instance, a 2-gram (also called a bigram) is a pair of characters, a 3-gram is a triple of characters, and soon.

Explanation / Answer

you will need to complete the MarkovModel class, which generates a Markov model from an input string, and also write a JUnit test for your model. Markov models are probabilistic models (i.e., they model the chances of particular events occurring) and are used for a broad range of natural language processing tasks (including computer speech recognition). They are widely used to model all sorts of dynamical processes in engineering, mathematics, finance and many other areas.

They can be used to estimate the probability that a symbol will appear in a source of text, given the symbols that have preceded it.

A zero-th order Markov model of a text-source is one that estimates the probability that the next character in a sequence is, say, an “a”, based simply on how frequently it occurs in a sample. Higher-order Markov models generalize on this idea. Based on a sample text, they estimate the likelihood that a particular symbol will occur in a sequence of symbols drawn from a source of text, where the probability of each symbol occurring can depend upon preceding symbols. In a first order Markov model, the probability of a symbol occurring depends only on the previous symbol. Thus, for English text, the probability of encountering a “u” can depend on whether the previous letter was a “q”. If it was indeed a “q”, then the probability of encountering a “u” should be quite high. For a second order Markov model, the probability of encountering a particular symbol can depend on the previous two symbols; and generally, the probabilities used by a k-th order Markov model can depend on the preceding k

symbols.

A Markov model can be used to estimate the probability of a symbol appearing, given its k predecessors, in a simple way, as follows:

For each context of characters of length k

, we estimate the probability of that context being followed by each letter c in our alphabet as the number of times the context appears followed by c, divided by the number of times the context appears in total. As with our NgramAnalyser class, we consider our input string to “wrap round” when analysing contexts near its end. Call this way of estimating probabilities “simple estimation”.