Writing Panda Dataframes to csv file in chunks

∥☆過路亽.° 提交于 2019-12-18 02:51:30

问题


I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of that data is of interest to me.

I figure I can make things easier for me by creating copies of these files with only the columns of interest so I have smaller files to work with for post processing.

My plan was to read the file into a dataframe and then write to csv file.

I've been looking into reading large data files in chunks into a dataframe.

However, I haven't been able to find anything on how to write out the data to a csv file in chunks.

Here is what I'm trying now, but this doesn't append the csv file:

with open(os.path.join(folder, filename), 'r') as src:
    df = pd.read_csv(src, sep='\t',skiprows=(0,1,2),header=(0), chunksize=1000)
    for chunk in df:
        chunk.to_csv(os.path.join(folder, new_folder,
                                  "new_file_" + filename), 
                                  columns = [['TIME','STUFF']])

回答1:


Try:

chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename), cols = [['TIME','STUFF']], mode='a')

The mode='a' tells pandas to append.




回答2:


Check out the chunksize argument in the to_csv method. Here are the docs.

Writing to file would look like:

df.to_csv("path/to/save/file.csv", chunksize=1000, cols=['TIME','STUFF'])



回答3:


Why don't you only read the columns of interest and then save it?

file_in = os.path.join(folder, filename)
file_out = os.path.join(folder, new_folder, 'new_file' + filename)

df = pd.read_csv(file_in, sep='\t', skiprows=(0, 1, 2), header=0, names=['TIME', 'STUFF'])
df.to_csv(file_out)


来源:https://stackoverflow.com/questions/38531195/writing-panda-dataframes-to-csv-file-in-chunks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!