A Scanner in C++ The first phase of compilation is called scanning or lexical an
ID: 3577874 • Letter: A
Question
A Scanner in C++
The first phase of compilation is called scanning or lexical analysis. This phase interprets the input program as a sequence of characters and produces a sequence of tokens that will be used by the parser.
Write a C++ program that implements a scanner for a language whose tokens are defined below:
<Keyword> -> if | then | else | begin | end | program
<Identifier> -> <char> | <char> <identifier>
<Integer> -> <digit> | <digit> <integer>
<Special> -> ( | ) | [ | ] | + | - | = | , | ;
<Digit> -> 0|1|2|3|4|5|6|7|8|9
<Char> -> a|b|c|…|z|A|B|…|Z
The token classes that will be recognized are Keyword, Identifier, Integer, and Special. Tokens are separated by white spaces (blanks, newlines and tabs) and/or special characters. The language is NOT case sensitive (i.e., you could and probably should convert and store all the non-numeric tokens in lowercase).
You may assume that
· The input program is syntactically correct.
· There are fewer than 1000 distinct tokens.
· Each identifier has up to 15 characters.
· The long int data type of C++ is sufficient to represent any of the integers.
Your program should read the input from a file named “scan.in” and build a symbol table that contains an entry for each token that was found in the input. You may use any data structure for the symbol table (e.g., an array of struct) although compilers often use a hash table. After all the input have been read, your program should produce a summary report in a file named “scan.out” that includes a list of the tokens that appeared in the input, the number of times each token appears in the input and the classification of each token. The last line of the output file should print the sum of all integers in the table. (This is just to ensure that the integers are read and stored as integers in your program.)
Sample I/O
scan.in example
program scan
begin
integer ab, bb, ba=0, AA[10];
ab = 123456;
bb = 111111;
if ab=bb then
AA[1] = (AA[1]+1);
else if ab=ba then
AA[2] = AA[2]-2;
else
AA[3] = 0;
end else
end
scan.out example
Token Class count
program Keyword 1
scan Identifier 1
begin Keyword 1
integer Keyword 1
ab Identifier 4
, Special 3
bb Identifier 3
ba Identifier 2
= Special 8
0 Integer 2
aa Identifier 6
[ Special 6
10 Integer 1
] Special 6
; Special 6
123456 Integer 1
111111 Integer 1
if Keyword 2
then Keyword 2
1 Integer 3
( Special 1
+ Special 1
) Special 1
else Keyword 3
2 Integer 3
- Special 1
3 Integer 1
end Keyword 2
The sum of all integers is 234589.
Explanation / Answer
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
#include<iostream.h>
#include<stdlib.h>
int isoperator(char ch)
{
if(ch=='+')
return 1;
elseif(ch=='-')
return 2;
elseif(ch=='*')
return 3;
elseif(ch=='/')
return 4;
elseif(ch=='=')
return 5;
elsereturn 0;
}
/*int issymbol(int symcounter,char symbol[][10],char currstr[])
{
for(int i=0;i<symcounter;i++)
{
if(strcmp(symbol[i],currstr)==0)
return i;
}
return -1;
}*/
void main()
{
char instr[50];
char currstr[50];
char symbol[10][10];
int scantab[6][5]={ 1, 2, 3,-1,-1,
-1,-1,-1,-1,-1,
-1, 2, 2, 2,-1,
-1,-1, 3,-1, 4,
-1,-1, 5,-1,-1,
-1,-1, 5,-1,-1 };
int counter=0,j;
int currstate=0,symcounter=0;
char currsym;
int i=0;
int count=0;
clrscr();
printf("ENTER THE STRING : ");
scanf("%[^ ]",&instr);
while(instr[counter]!='')
{
count=0;
while(instr[counter++]==' ') {}
counter--;
while(instr[counter]!=' ' && instr[counter]!='')
currstr[count++]=instr[counter++];
// printf("%d",count);
currstr[count]='';
// printf("%s",currstr);
currstate=0;
int i=0;
while(i<strlen(currstr))
{
if(isoperator(currstr[i])>0)
j=0;
elseif(isalpha(currstr[i]))
j=1;
elseif(isdigit(currstr[i]))
j=2;
elseif(currstr[i]=='_')
j=3;
elseif(currstr[i]=='.')
j=4;
else
{
printf("ERROR IN INPUT ! PLEASE TRY AGAIN WITH SUITABLE STRING :");
getch();
exit(0);
}
currstate=scantab[currstate][j];
i++;
}
if(currstate==1)
//printf(" <op,%d>",isoperator(currstr[--i]));
printf(" <op> ");
elseif(currstate==2)
{
/* int temp;
temp=issymbol(symcounter,symbol,currstr);
if(temp>=0)
printf(" <id#%d>",temp+1);
else
{
strcpy(symbol[symcounter],currstr);
symcounter++;
printf(" <id#%d>",symcounter);
}*/
printf(" <id> ");
}
elseif(currstate==3)
printf(" <int>");
elseif(currstate==5)
printf(" <real>");
elseif(currstate==-1)
printf(" error ..");
}
getch();
}
token types all
***Tokens and Types***
Types: integers(0), operators(1), illegal(2),
end of text(4), and oversize integers(5)
Type input text, EOF to quit
123 * 723456 + 12
Token = 123 Type = 0
Token = * Type = 1
Token = 72345 Type = 4
Token = + Type = 1
Token = 12 Type = 0
45+23*7
Token = 45 Type = 0
Token = + Type = 1
Token = 23 Type = 0
Token = * Type = 1
Token = 7 Type = 0
x = 8;
Token = x Type = 2
Token = = Type = 2
Token = 8 Type = 0
Token = ; Type = 2
'136D