Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The object of this project is to create a whole program that: 1. Obtains a file

ID: 3536797 • Letter: T

Question

The object of this project is to create a whole program that:


1.       Obtains a file name from the user


2.       Reads the named file, token-by-token


3.       Scrubs the individual tokens of certain punctuation characters (but leaves others)


a.       Remove the following characters:


                                                                i.      End of line characters     . ? !


                                                               ii.      Separators:                      , : ; / |


                                                             iii.      Double quotes


                                                             iv.      Special characters:          ^ * + = _


                                                            v.      Grouping characters      ( ) { } [ ] < >


                                                             vi.      Single quotes (only if the LAST character of a token)


b.      Do NOT REMOVE these characters


                                                                i.      Numbers


                                                               ii.      Apostrophes (for contractions or possession, like isn%u2019t or Dave%u2019s)


                                                             iii.      Dashes (for word conjunction, like side-by-side)


                                                             iv.      Special characters:           @ # $ %


4.       After scrubbing, check to be sure that some characters still remain. If the token is now empty, discard and move to the next token, otherwise, convert the token to all lower-case


5.       Places the scrubbed tokens (words) in an ArrayList (in sorted order)


6.       Each ArrayList entry will have two parts. The first part is the scrubbed token (word) and the second part is the count of how many times this word appears in the file
for example: if the token %u2018hello%u2019 has been seen 4 times, the ArrayList entry would be:
                             
hello *4
                      if the token %u2018help%u2019 is being newly inserted into the ArrayList, the entry should be:
                            help *1


a.       This means that only the first part of each ArrayList entry will be used to determine the sort ordering


b.      When deciding where to insert a new token in the ArrayList, walk the ArrayList from the front until an insert point is reached. If the same token is found in the ArrayList, don%u2019t insert a new element, just increment the second part. For example: if the current token is %u2018help%u2019 and %u2018help *5%u2019 is already in the ArrayList, %u2018help *5%u2019 should be updated to %u2018help *6%u2019


7.       After all tokens are read from the original file, a new file should be written with each ArrayList entry on a single line. The new file should be named:   _sorted.txt


If the original file contained:


     This is my
           file, yes my file


        My file.



The output file should contain:

        file *3
       is *1
      my *3
      this *1
        yes *1

Explanation / Answer

You can Use flex & bison to write these very easily and efficiently without errors..


for reference on C language :

Lex : http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

bison : http://www.lysator.liu.se/c/ANSI-C-grammar-y.html