Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Implement scanner for the provided lexical definitions. The scanner is embedded

ID: 3754697 • Letter: I

Question

Implement scanner for the provided lexical definitions.

The scanner is embedded and thus it will return one token every time it is called. Since the parser is not available yet, we will use a tester program to call the scanner.

The scanner could be implemented as

1.) Plain string reader - read strings separated by spaces - (70 points) assuming

all tokens must be separated by spaces

lines may not be counted (75 if counted)

comments may be without spaces

2.) Module generated by lex

3.) FSA table + driver

Requirements:

* Implement a token as a triplet {tokenID, tokenInstance, line#} (if option with line numbers)

*Dont forget EOFtk token

*Implement the scanner in a separate file with basename "scanner"

*For testing purposes, the scanner will be tested using a testing driver implemented in file with basename "testScanner". You need to implement your own tester and include as a part of the project. The tester will ask one token at a time and display the token to the screen one per line, including information (descriptive) on what token class, what token instance, and what line, if applicable.

* Invocation:

to read from stdin or file file.fs18

*Arguments are the same as P0

*Wrong invocations may not be graded

*Dont confuse executable name with file name with function name

You must have (C++ can be accordingly different)

*types including token type in token.h

*implement scanner in scanner.c and scanner.h

*implement the tester in another file testScanner.c and testScanner.h

*main.c processing the arguments (as P0) then calling testScanner() function with interface and preparation as needed.

Lexical Definitions:

*All case sensitive

*Each scanner error should display "Scanner Error:" followed by details including the line number ff available

*Alphabet

1. All English letters (upper and lower), digits, plus the extra characters as seen below, plus WS

2. No other characters allowed and they should generate errors

*Identifiers

1. begin with a lower case letter and continue with any number of letters or underscores

2. you may assume no identifier is longer than 8 characters

*Keywords (reserved, suggested individual tokens)

*Operators and delimiters group.

*Integers

Any sequence of decimal digits, no sign, no decimal point. You may assume no number longer than 8 characters .Comments start with $ and end with $.

Explanation / Answer

The yacc (yet another compiler compiler) utility provides a general tool for imposing structure on the input to a computer program. Before using yacc, you prepare a specification that includes:

A set of rules to describe the elements of the input

Code to be invoked when a rule is recognized

Either a definition or declaration of a low-level scanner to examine the input

yacc then turns the specification into a C-language function that examines the input stream. This function, called a parser, works by calling the low-level scanner.

The scanner, called a lexical analyzer, picks up items from the input stream. The selected items are known as tokens. Tokens are compared to the input construct rules, called grammar rules.

When one of the rules is recognized, the code you have supplied for the rule is invoked. This code is called an action. Actions are fragments of C-language code. They can return values and use values returned by other actions.

The heart of the yacc specification is the collection of grammar rules. Each rule describes a construct and gives it a name. For example, one grammar rule might be:

where date, month_name, day, and year represent constructs of interest; presumably, month_name, day, and year are defined in greater detail elsewhere.

In the example, the comma is enclosed in single quotes. This means that the comma is to appear literally in the input. The colon and semicolon are punctuation in the rule and have no significance in evaluating the input. With proper definitions, the input:

might be matched by the rule.

The lexical analyzer is an important part of the parsing function. This user-supplied routine reads the input stream, recognizes the lower-level constructs, and communicates these as tokens to the parser. The lexical analyzer recognizes constructs of the input stream as terminal symbols; the parser recognizes constructs as nonterminal symbols. To avoid confusion, refer to terminal symbols as tokens.

There is considerable leeway in deciding whether to recognize constructs using the lexical analyzer or grammar rules. For example, the rules:

might be used in the above example. While the lexical analyzer only needs to recognize individual letters, such low-level rules tend to waste time and space and can complicate the specification beyond the ability of yacc to deal with it.

Usually, the lexical analyzer recognizes the month names and returns an indication that a month_name is seen. In this case, month_name is a token and the detailed rules are not needed.

Literal characters, such as a comma, must also be passed through the lexical analyzer and are also considered tokens.

Specification files are very flexible. It is relatively easy to add to the previous example the rule:

allowing:

as a synonym for:

  date: month_name day ',' year ;