Regular Expressions and Grep (part I) For much of this and the next two labs, yo
ID: 3738818 • Letter: R
Question
Regular Expressions and Grep (part I)
For much of this and the next two labs, you will use files that are already in your VM. However, for some parts, you will need the following files. Use the wget statements below to download the files. Put them into ~Student/FILES for convenience (cd to ~/FILES before doing the wget commands below).
• wget www.nku.edu/~foxr/equals.txt
• wget www.nku.edu/~foxr/names.txt
• wget www.nku.edu/~foxr/sentences.txt
1. You will start by experimenting with grep (egrep). [egrep means grep -E. You either can write grep -E or just egrep.]To know more about grep, egrep and their options you should run man command. I encourage you to do so. Enter each of the following commands. Look at the output and see if you can figure out what each regular expression represents. Your instructor may discuss these in your class. There are no questions for this step. a. egrep ‘[0-9]+’ *
b. egrep ‘[A-Z]+{12,}’ *
c. egrep ‘[A-Z]+=[0-9]+’ *
d. egrep ‘[A-Z]+=’ bashrc
e. egrep ‘if [‘ bashrc
f. egrep ‘() {‘ bashrc
g. egrep ‘[0-255].[0-255].[0-255].[0-255]’ *
h. In the previous example, we wanted to list from files any that had IP addresses. You might notice in the output that you got a lot of output that consists of non-IP addresses. Can you figure out what is wrong with the above regex? Use this regex instead:
Type cd /etc for the next set of commands.
[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}
Can you figure out why this regex is more appropriate?
2. You are responsible for answering the questions from this point of the lab forward. Change directory to ~Student/FILES. Look at the content of the three files sales.txt, computers.txt and addresses.txt to familiarize yourself with them. We will use egrep to search for patterns in the sales.txt file. a. In sales.txt, we will search for all lines that contain the month Feb. The command is
b. Remember that the –c option for egrep counts the number of occurrences. Write egrep commands to count the number of entries in the file that contain Cameron. What command did you enter? How many were found?
c. Repeat b for entries that include KY.
d. Let’s find all lines that contain a commission rate of .15. Enter the command
egrep ‘Feb’ sales.txt
What was the response? Now how would you search for all lines that contain an entry for Smith?
egrep ‘.15’ sales.txt
e. Let’s assume we want to find all records whose Sales value is over 9999. Type the command
f. Following up on part e, assume we instead want to find everyone who had less than 10000 for Sales. The command you might think of would be
g. To find all entries of sales in either OH or PA, you can use egrep ‘OH|PA’ sales.txt. Try it. Write an egrep command to find all entries that contain either Barber or Cameron. What command did you enter?
h. Following up on part g, how could you search for entries that contain both OH and PA? Hint: you cannot do this with a single egrep statement, instead use a pipe between two egrep statements. Include your command in your answers.
Look at the response. What entries appeared that shouldn’t? Why did they appear? How will you fix this regex? Do so and try again to make sure you have the correct answer.[Hint: look into man page for egrep to find exact match]
egrep ‘[1-9][0-9]{4}’ sales.txt
Why did we use [1-9] instead of just having [0-9]{5}?
egrep ‘[1-9][0-9]{3}’ sales.txt
Enter this command. You will see that it responded with all entries. Why didn’t it work? See if you can figure out. One solution to solve this problem is to repeat the command from part e but add the –v option to your egrep command. Try it out to see if it worked.
3. For this step, write egrep commands using the computers.txt file. Each step requires that you come up with the egrep command. The answer to place in your answer file is the egrep command you come up with which successfully accomplishes the step. a. Find all entries of faculty whose name starts with the letter F, G, H, I, J or K.
b. Find all entries of faculty whose name ends with an n.
c. Find all entries that are on the 4th floor of their building. This means that the room number is 4xx (so for instance, it should not include 314 just because there is a 4)
Explanation / Answer
2)
a) egrep "Feb" sales.txt
b) egrep -c "Cameron" sales.txt
c) egrep -c "KY" sales.txt
d) egrep ".15" sales.txt
Entries for Smith: egrep "Smith" sales.txt
e) egrep "[1-9][0-9][0-9][0-9][0-9]+" sales.txt
For values greater than 9999, all the values have atleast 5 digits or more that is why we have used 4 [] and a []+ in the end saying last [] should be repeated 1 or more times. Also we need to take care that first digit cannot be 0 so first [] contains onlly 1-9.
Do not use : egrep ‘[1-9][0-9]{4}’ sales.txt bcoz this will result sales values having 5 digits in the number i.e. 10000-99999
f) egrep "[0-9][0-9][0-9][0-9]" sales.txt
Do not use : egrep ‘[1-9][0-9]{3}’ sales.txt bcoz this will result in sales values having 4 digits and not the ones having 3/2/1 digit
g)
egrep ‘OH|PA’ sales.txt
egrep ‘Barber|Cameron’ sales.txt
h) In the above command, Etiher OG or PA but here both should be there so the correct command would be:
egrep "OH" sales.txt| egrep "PA"
3) Outuput of below commands may vary since exact syntax of computers.txt is not specified in the problem but the idea remains same.
a) egrep -e "^[FGHIJK].*$" computers.txt
b) egrep -e ".*n$" computers.txt
c) egrep "4[0-9][0-9]" computers.txt
Assuming floor number will only contains digits, only 0-9 is allowed.