Converting an entire column of strings into floats for a csv file. I have a csv
ID: 3813200 • Letter: C
Question
Converting an entire column of strings into floats for a csv file.
I have a csv file where column F (except for the first row because that is the title row) has transactionDateTime strings in the following format:
12/28/2016 5:20:06 AM
I need these to be in the form of floats, so I can use them in a database. Something like (12,28,2016,5,20,06,AM). Note here that I need something differentiate AM and PM. AM could be represented by a 1 and PM could be represented by a 2. It's just impportant that they are differentiated.
To be clear, I want the program to write a new csv file with all of the same information except this single column (F) of strings (aside from row 1), will be in a float format.
Thanks!
Explanation / Answer
CSV stands for comma separated value(s), and is a simple text based file format used to store tabular data in a format that is easy for humans to read as well as being easy to load for programs. Here is an example CSV file that records the number of As Seen on TV products a factory produces each day.
Date, ShamWow, Cami Lace, Instant Switch, Flowbee
Monday, 1232, 3221, 638, 893
Tuesday, 1532, 2832, 543, 789
Wednesday, 1132, 3148, 593, 827
Thursday, 1341, 2944, 601, 832
Friday, 1242, 1234, 621, 794
Note that each line is a data entry (data row), and in this example the first line is special, in that it has a text descriptor of each column. Also note that even though the columns are not lined up in the text CSV file, it is understood that the commas should be used to indicate column breaks, such that on Friday the factory produced 794 flowbees.
You can export data in CSV format simply by putting a comma between each data item, and placing a newline n character at the end of each record. Of course, the specific python code to accomplish this depends upon what internal format your data is stored in. Here is an example stored in a list:
factoryOutput = [ ["Monday", 1232, 3221, 638, 893],
["Tuesday", 1532, 2832, 543, 789],
["Wednesday", 1132, 3148, 593, 827],
["Thursday", 1341, 2944, 601, 832],
["Friday", 1242, 1234, 621, 794] ]
def writeOutput(filename, output):
#Open the file
ourFile = open(filename, "w")
#First, write the 'special' header of column labels, followed by a newline
ourFile.write("Date, ShamWow, Cami Lace, Instant Switch, Flowbee")
ourFile.write(" ")
#Second, iterate through the data elements (records) with a for loop
for record in factoryOutput:
for item in record:
itemAsString = str(item)
ourFile.write(itemAsString)
ourFile.write(",")
#End each full record with a newline.
ourFile.write(" ")
#Third, close the file.
ourFile.close()
writeOutput("factoryOutput.csv", factoryOutput)
Note the lines of code that do the actual writing. They consist of a doubly nested for loop. Line 16, the first for record in factoryOutput loop goes through each day’s factory output, while the second for item in record loop (line 17) writes out each item. Note that because we can not directly write an integer, we have to convert each item to a string before writing it, including the string that represents the day. Line 18 does this conversion for us. It is safe to convert a string to a string, which has no real effect. If we failed to do this, python would give us an error such as the following as soon as we tried to write an integer.
File "factoryOutput2CSV.py", line 27, in <module>
writeOutput("factoryOutput.csv", factoryOutput)
File "factoryOutput2CSV.py", line 19, in writeOutput
ourFile.write(item)
TypeError: must be str, not int
Line 20 places a comma after each item. However, our output will NOT exactly match our example, as we will have a comma after EVERY data item, including the last!:
Date, ShamWow, Cami Lace, Instant Switch, Flowbee
Monday,1232,3221,638,893,
Tuesday,1532,2832,543,789,
Wednesday,1132,3148,593,827,
Thursday,1341,2944,601,832,
Friday,1242,1234,621,794,
Luckily for us, most programs that read CSV formatted data will accept this, although some of them may have blank or null entries as the last item in each record, and some may complain that the first row (the header) does not have as many entries as the rest unless you also place an extra comma at the end of the header. It will be left as an exercise for the reader to write CSV files that omit the trailing comma.