Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

IN PYTHON: Do NOT use the methods read, readline, or readlines In all problems i

ID: 3695679 • Letter: I

Question

IN PYTHON: Do NOT use the methods read, readline, or readlines

In all problems involving file processing, you are to use the general approach of reading a file line by line using a for loop. Do NOT use the methods read, readline, or readlines.

a.   Write a program splitter.py that reads a text file sentences.txt line by line – this file consists of sentences (one sentence per line) where the sentences do not contain blanks. However, the first character of every new word is uppercase. The program is to create a new file newsentences.txt that contains all the sentences from the original file with blanks inserted into them and no word other than the first one should be capitalized.
For example, if the input file contains the lines:

StopAndSmellTheRoses.
WhenInDoubt,Mumble.

then the output file look as follows:

Stop and smell the roses.
When in doubt, mumble.

Explanation / Answer

Import the string functions from python
import string

# 1) Splits the text file into individual characters
# to identify the commas and parsing the individual
# tokens.

----------------------------------------------------------------------------------------------------------------------

# create a list to store the inputted numbers
numbers = list()
# Open the input text file for reading
dataFile = open('numbers.txt', 'r')

# Loop through each line of the input data file
for eachLine in dataFile:
# setup a temporay variable
    tmpStr = ''
    # loop through each character in the line
    for char in eachLine:
        # check whether the char is a number
        if char.isdigit():
            # if it is a number add it to the tmpStr
            tmpStr += char
            # if a comma is identified and tmpStr has a
            # value then append it to the numbers list
        elif char == ',' and tmpStr != '':
            numbers.append(int(tmpStr))
            tmpStr = ''
    # if the tmpStr contains a number add it to the
    # numbers list.
    if tmpStr.isdigit():
        numbers.append(int(tmpStr))
# Print the number list
print numbers
# Close the input data file.
dataFile.close()

# 2) Uses the string function split to line from the file
# into a list of substrings
numbers = list()
dataFile = open('C:\PythonCourse\unit3 umbers.txt', 'r')

for eachLine in dataFile:
    # Simplify the script by using a python inbuilt
    # function to separate the tokens
    substrs = eachLine.split(',',eachLine.count(','))
    # Iterate throught the output and check that they
    # are numbers before adding to the numbers list
    for strVar in substrs:
        if strVar.isdigit():
            numbers.append(int(strVar))

print numbers

dataFile.close()

fixed_expressions = ["Mr.", "Mrs.", "i.e.", "Jr."] sentence_boundaries = [".", "!", "?"] def sentence_splitter(oldFile, newFile): newF = open(oldFile, 'rt') text = newF.read() newF.close() sentences = [] sentence = "" # tmp container # text = "Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't." text = text.strip() i = 0 while i < len(text)-1: char = text[i] if char not in sentence_boundaries: sentence+=char else: sentence+=char if text[i+1] == " " and text[i+2].isupper(): expression_used = False for expression in fixed_expressions: if sentence.endswith(expression): expression_used = True if not expression_used: if len(sentence) > 0: sentences.append(sentence.strip()) sentence = "" i+=1 # check for leftover at the end: if i == len(text)-1: char=text[i] sentence+=char if len(sentence.strip()) > 0: sentences.append(sentence.strip()) tokenF = open(newFile, 'wt') for s in sentences: print(s+" ") tokenF.close() return sentences sentence_splitter(oldFile="../data/austen-emma.txt", newFile="tokenized.txt")