Microblogging sites such as Twitter and Ask.fm are sometimes misused to abuse pe
ID: 3688124 • Letter: M
Question
Microblogging sites such as Twitter and Ask.fm are sometimes misused to abuse people. In this part of the assignment your task is to screen each tweet for the presence of swear words. We provide an initial list of bad words in the file named swear_words.txt. The file twitter_data.txt contains real tweets collected to study cyberbullying. Each line is a different tweet. Write a function in Python language that will read each tweet in the file, will look for swear words, and will write to a new file named potentially_offensive_tweets.txt all tweets containing foul language. Note that the sample may have repeated tweets as well as tweets in a foreign language.
You may find the need to update your swear_words.txt file. That’s expected, as the list is not comprehensive.
Twitter_data.txt link: https://drive.google.com/open?id=0BzB5lIrANOIPNXJVb3ZnbksxVTg
swear_words.txt link: https://drive.google.com/open?id=0BzB5lIrANOIPUDc2Q04wdjBUcUU
Explanation / Answer
file3 = open("potentially_offensive_tweets.txt","w")
file1 = open("twitter_data.txt", "r")
f1 = file1.readlines()
file1.close()
file2 = open("swear_words.txt","r+")
f2 = file2.readlines()
file2.close()
for line in f1:
for word in f2:
if word.split(' ')[0] !='':
if word.split(' ')[0].lower() in line.lower():
file3.write(line)
file3.close()