Description: As described in class, the basic idea of a compiler is to take a hi
ID: 673091 • Letter: D
Question
Description: As described in class, the basic idea of a compiler is to take a high-level language and convert it to a low-level language that can be executed on a computer. In this project, you will design and develop an interpreter/compiler that translates my version of a Markdown language (see http://en.wikipedia.org/wiki/Markdown, http://daringfireball.net/projects/markdown/syntax). Briefly, a Markdown language allows for easy-to-read annotations within text that is then automatically converted to valid, well-formed HTML5, and they have become quite popular with use in GitHub, Wikipedia, etc. As an extension to typical Markdown languages, our language will additionally provide for statically scoped variables to be defined and used throughout the Markdown document.
Specifically, our Markdown language will support the following commands (bold is used to emphasize the syntax and differentiate it from the text):
#BEGIN … #END
The document annotations denote the beginning and ending of a valid source file in our Markdown language. All valid source files must start with #BEGIN and end with #END (i.e., there cannot be any text before or after). Between theses annotations, all other annotations (or none at all) may occur except for a repeating of the #DOCUMENT annotations. In HTML5, these annotations correspond with the <html> and </html> tags, respectively.
^ … ^
The head annotation in our language only the title annotation, within the Markdown source file. The head tag is not required in a Markdown file, but if it is present it must be immediately following the #BEGIN annotation. In HTML5, these annotations correspond with the <head> and </head> tags, respectively.
< ... >
The title annotation denotes the title of the resulting html page that shows up in the browser toolbar. Within these annotations, only plain text is possible (i.e., no other annotations). Title annotations must occur within ^ .. ^ head annotation. In HTML5, these annotations correspond with the <title> and </title> tags, respectively.
{ … }
The {} paragraph annotations denote the beginning and ending of a paragraph within the Markdown source file. Within these annotations, the bold, italics, list, item and link annotations are allowed but not required (note that you cannot have a {} paragraph annotation within another paragraph annotation). In HTML5, these annotations correspond with the <p> and </p> tags, respectively.
** text **
The bold annotation, signaled by two asterisks before/after plain text, denotes the beginning and ending of text within the Markdown source file that is in a bold font. Within these annotations, only plain text is possible (i.e., no other annotations). Bold annotations do not have to occur within paragraph annotations, they may occur on their own. In HTML5, these annotations correspond with the <b> and </b> tags, respectively.
* text *
The italics annotation, signaled by a single asterisk before/after plain text, denotes the beginning and ending of text within the Markdown source file that is in an italic font. Within these annotations, only plain text is possible (i.e., no other annotations). Italics annotations do not have to occur within paragraph annotations, they may occur on their own. In HTML5, these annotations correspond with the <i> and </i> tags, respectively.
+ list item ;
The + .. ; annotation denotes a bulleted list item within the Markdown source file. A list item, which must start with a “+” and end with a “;”, may contain only the bold, italics and link annotations but not required (i.e., it can just be plain text). In HTML5, these annotations correspond with the <li> and </li>tags, respectively.
~
The ~ annotation, within the Markdown source file, may appear anywhere within a document outside of the head and title annotations. In HTML5, this annotation corresponds with the <br> tag.
[linked phrase](address)
The link annotation, within the Markdown source file, denotes a link element (see http://www.w3schools.com/html/html_links.asp) within the Markdown source file. The [] and () annotations must contain some text (denoted by linked phrase and address above) giving the address of the page to link to. For example, the following Markdown annotation:
[The Simpsons]( http://www.simpsonsworld.com/)
would correspond in HTML5 to:
<a href=” http://www.simpsonsworld.com/”> The Simpsons </a>
For this annotation, you do not need to validate the address – you may assume whatever address provided is valid.
@(audio file address)
The audio annotations denote an audio element (see http://www.w3schools.com/html/html5_audio.asp) within the Markdown source file. The @() annotation must contain some text (denoted by address above) giving the address of the MP3 file to link to. For example, the following Markdown annotation:
@(http://www.televisiontunes.com/themesongs/The%20Simpsons.mp3)
would correspond in HTML5 to:
<audio controls>
<source src="http://www.televisiontunes.com/themesongs/The%20Simpsons.mp3">
</audio>
For this annotation, you do not need to validate the address – you may assume whatever address provided is valid. For simplicity, we will only use MP3 encoded files.
%(address)
The video annotations denote a YouTube video element (captured in an iframe tag) within the Markdown source file. The %() video annotations must contain some text (denoted by address above) giving the address of the YouTube file to link to. For example, the following Markdown annotation:
&(http://www.youtube.com/embed/zoO0s1ukcqQ)
would correspond in HTML5 to:
<iframe src="http://www.youtube.com/embed/zoO0s1ukcqQ"/>
For this annotation, you do not need to validate the address – you may assume whatever address provided is valid. For simplicity, we will only use YouTube links.
In addition to these Markdown annotations, our Markdown language will include the capability to define and use statically-scoped variables, defined as follows:
$DEF variable name = value $END
The define annotations structure denotes the beginning and ending of a variable definition within the Markdown source file. The $DEF … $END annotations must contain some text (denoted by variable name above) giving the name of the variable, an “=” annotation that must be followed by some text (denoted by value above) giving the value of the variable. The $DEF annotation may occur within any other annotation block but, if it occurs, it must be the very first annotation to occur within that block (i.e., immediately following the start of another annotation). The scope of the variable definition starts after the $DEF tag in the block and continues to the end of its immediate enclosing block.
$USE variable name $END
The use annotations denote the beginning and ending of the use of a variable within the Markdown source file. The $USE … $END annotations must contain only text (denoted by variable name above) noting the variable value to use. The #USE annotation may occur within any other annotation block.
Note that all annotations are not case sensitive (i.e., #BEGIN and #begin are legal).
Finally, in our Markdown documents, you may assume that whenever there is text (both in text and the cases when an address is provided) possible the following are the only allowed characters:
Upper and lower-case letters: A .. Z; a .. z
Numbers: 0 .. 9
Punctuation: commas (i.e., ‘,’), periods (i.e., ‘.’), quotes (i.e., ‘”’), colons (i.e., ‘:’), question marks (i.e., ‘?’), underscore (i.e., ‘_’), exclamation points (i.e., ‘!’) and slashes (i.e., ‘/’)
Special characters: newline, tabs
Except for these characters, you may assume no other character is possible in the text and your grammar does not need to account for them (i.e., the “#”, “@”, “{“, etc. characters will only be used to denote one of our Markdown annotations and will not be found in the text).
Examples: This section presents some basic examples of our Markdown languages and its “compiled” HTML code (indented for readability).
The Markdown source code
#BEGIN
^ < The Simpsons > ^
{
The Simpsons!
@(http://www.televisiontunes.com/themesongs/TheSimpsons.mp3) ~
The members of the [The Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) are:
+ Homer Simpson ;
+ Marge Simpson ;
+ Bart Simpson ;
+ Lisa Simpson ;
+ Maggie Simpson ;
~
Lets watch now: ~
%(http://www.youtube.com/embed/zoO0s1ukcqQ )
}
#END
would compile to the HTML5 code
<html>
<head>
<title> The Simpsons </title>
</head>
<p> The Simpsons! <br>
<audio controls>
<source src="http://www.televisiontunes.com/themesongs/The%20Simpsons.mp3">
</audio> <br>
The members of the <a href = “https://en.wikipedia.org/wiki/The_Simpsons”> The Simpsons</a> are:
<li> Homer Simpson</li>
<li> Marge Simpson</li>
<li> Bart Simpson</li>
<li> Lisa Simpson</li>
<li> Marge Simpson</li>
<br>
Lets watch now: <br>
<iframe src="http://www.youtube.com/embed/zoO0s1ukcqQ"/>
</p>
</html>
Note that your code does not need to preserve the spacing and tabs as shown above.
Using a definition of a variable in our Markdown language, we could provide the source code:
#BEGIN
$DEF lastname = Simpson $END
{
The members of the $ USE lastname $END family are:
+ Homer $USE lastname $END ;
+ Marge $USE lastname $END ;
+ Bart $USE lastname $END ;
+ Lisa $USE lastname $END ;
+ Maggie $USE lastname $END ;
}
#END
would also compile to the HTML5 code
<html>
<p> The members of the Simpson family are:
<li> Homer Simpson</li>
<li> Marge Simpson</li>
<li> Bart Simpson</li>
<li> Lisa Simpson</li>
<li> Marge Simpson</li>
</p>
</html>
As you can see, the compiler should take the value of the lastname variable and replace it in the compiled code whenever the variable is used for its statically determined scope. That is, the definition of the lastname variable in this example is essentially global since it is defined immediately following the #BEGIN block. To fully illustrate the scoping desired in our Markdown language, consider the following example in our Markdown language with two defined variables:
#BEGIN
$DEF myname = Josh #END
Hi, my name is $USE myname $END .
{
$DEF myname = Jon $END
Inside the paragraph block, my name is $USE myname $END
}
Now my name is $USE myname $END again.
#END
should correctly compile into HTML5 as
<html>
Hi, my name is Josh.
<p> Inside the paragraph block, my name is Jon. </p>
Now, my name is Josh again.
</html>
The scoping used here is the same as you are used to in most programming languages. If a variable is used without first being defined, this should be an error.
Details and Deliverables: I strongly suggest using the standard software engineering approach for developing the compiler for our Markdown language. You will be required to meet two milestones during this project.
Phase 1: Grammar Design (15 points) Deadline: October 11, 2015, 11:59pm (Blackboard)
Write and submit a BNF grammar for our Markdown language. Your grammar should be parsable using a recursive-descent parser (as described in class and illustrated in Labs #1-3) using a one token lookahead. Note that this grammar is strictly to be written in BNF, not EBNF. Additionally, develop and submit the ANTLR-based grammar definition for this language. The submission of this phase should be in a single zip file containing your BNF in a text document (e.g., .odt, .doc, .docx, .text) as well as the ANTLR grammar file (e.g., .g) submitted
Explanation / Answer
#BEGIN
^ <welcomePage> Welcome to the world of language design ^
' ^ <title> heading>
{ Begining of a paragraph
**I will be displayed bold**
* I will be displayed in italics *
List of values: ~
Parts of a computer are listed below: ~
+ Key board ;
+ Mouse ;
+ Monitor ;
+ Hard Disk ;
+ DVD Drive;
@( audio file)
%(Video File)
paragraph ends here}
{
$DEF i = 1 $END
** I am a bold text line as well **
* I am an italics text line *
#END
type str1 = packed array[1..10] of char;
type ptrToHeading = ^heading^^;
ptrToParagraph = ^paragraph{};
ptrToTitle = ^title<>;
ptrToBold = ^bold**;
ptrToItalic = ^Italic*;
ptrToList = ^+;
characters ::= [A-Z][a-z]
numerals ::= [0-9]