Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Assume we start with a simple \"sentence\" grammar as follows: <S> ::= <NP><V><N

ID: 3862407 • Letter: A

Question

Assume we start with a simple "sentence" grammar as follows:

<S> ::= <NP><V><NP>
<NP> ::= <A> <N>
<V> ::= loves | hates | eats
<A> ::= a | the
<N> ::= dog | cat | rat

Part A:

Show the BNF above with the following additional grammar elements.
(Note, the final grammar will also require a few additional "intermediate" rules to accommodate proper sentence structure.)

Adjectives: (0 or more Adjectives separated by commas may precede any Noun. A comma may or may not be preceded by a space.)

furry

fast

lazy

sneaky

Adverbs: (0 or 1 Adverb may precede any Verb)

quickly

silently

Additional Verbs

chases

stalks

Additional Nouns

house

tree

Conjunctions: (0 or more may appear in any sentence

and

or

Preposition (For Prepositional Phrases)

with

around

up

Sentence terminator (The terminator may or may not be preceded by a space.)

. (a single period)

! (a single period)

Explanation / Answer

DOP1 can more formally be classified as a Stochastic Tree Substitution Grammar (STSG). Bod and Scha (1997) describe this formalism as a five-tuple < VN , VT , S, R, P > with VN and VT representing a finite set of nonterminal and terminal symbols respectively. S is the distinguished member of VN which constitutes the starting point in terms of generation and the end point in terms of parsing. R represents the fragment corpus (i.e. a finite set of elementary trees whose top and interior nodes are marked by some VN symbols and whose leaf nodes are marked by some VN or VT symbols). Finally, P is a function assigning to every member t of R a probability P(t) as described above. An STSG whose subtree depth is limited to one is equivalent to a SCFG. STSGs are stochastically stronger than SCFGs due to their ability to encode context (Bod, 1998). As already mentioned, DOP1 computes the probability of a parse tree by summing over the products of the probabilities of the fragments used in each of its derivations. It is possible, however, that words that are unknown to the training (and hence fragment) corpus might occur when processing new input. In such cases even if we allow the unknown word to match freely any possible category, the probability of the resulting parse tree will be zero since the probability of the subtree C unkn word is zero in the fragment corpus for all values of C. DOP1 is, hence, unable to handle input with unknown lexical items. In the section that follows we describe some of the approaches that have been proposed to deal with this issueThere are various LFG-DOP versions. The above probability definitions hold in all of them. The difference among them lies in the way the competition sets are defined. One possibility, for example, is based on the category of the root node of the c-structure, just as in Tree-DOP. This would, of course, produce many invalid representations during the derivation process, which would then be ruled out during validity sampling.1 Under this view the competition set CSi corresponding to the i th derivation step consists of all the fragments whose root node matches the leftmost substitution site (LSS) of di1. The above definition can be expressed more formally

CSi = {f : root(f) = LSS(di1)}

Another possibility is to impose a second condition ensuring uniqueness during the derivation process. Competition sets, therefore, consist of elements whose c-structure root nodes are of the same category and whose f-structures can be consistently unified with the f-structure corresponding to the c-structure they will combine

CSi = {f : root(f) = LSS(di1) unique(di1 f)}