Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I have to write a python script that splits this numbers.csv.gz into 10 new file

ID: 3889172 • Letter: I

Question

I have to write a python script that splits this numbers.csv.gz into 10 new files the files are based off of column B where column has ordered integers 1-10. I need to write a python script to break that file into 10 other files based off that column B so all those with 1 in the row belong to one file those with two belong to another and so forth.


This is all based off of column

524196 524210 524217 524220 524226 0524228 524254 2524265 3 524282 19 23 27 36 48 52 54 59 78 94 106 1 SDVWDWA GO 1 XDQMZPRI RPF 1 PIUWRMT BGR 1 ELMFKYRK OXB 1 TOTBMNA XLRD 1 OPBKLOYJ KRKZ 1 AAFOTMSS PEQE 1 RIHMFVLII LOXL 1 DWWQM MNG 2 MLVITYDL FHJS 2 NZXMSGO DQRB 2 IDOHCSUJ HIJYQ 2 DGKWOFA KVJUB 2 TGAMXHR ZEKSK 2 NUKQUYWJFRIM 2 XLFTDYPMOOIN 2 NPLIXERTF SKFFH 2 LZFZZRWA OSVX 2 DSEPMJUL WUQN 2 LOFJRDRY. RHFTL 4 6 7 8 59 70 71 72 73 74

Explanation / Answer

To split the big csv files python has a very good module 'pandas'.Just install that and it's a 4 line code.

Just install pandas dependencies.

import pandas as pd

csv = pd.read_csv('numbers.csv', sep=',', header=0, skipinitialspace=True)
# header=0 header is on the first line
# skipinitialspace is set to True because if example data has spaces after commas
for i in range(1,11):
    csv_col = csv[csv['B'] == i]
    csv_col.to_csv('col'+ i +'num.csv', index=False, sep=',')