WRITE only first N rows from pandas df to csv

问题

How can I write only first N rows or from P to Q rows to csv from pandas dataframe without subseting the df first? I cannot subset the data I want to export because of memory issues.

I am thinking of a function which writes to csv row by row.

Thank you

回答1:

Use head- Return the first n rows.

Ex.

import pandas as pd
import numpy as np
date = pd.date_range('20190101',periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=date, columns=list('ABCD'))

#wtire only top two rows into csv file
print(df.head(2).to_csv("test.csv"))

回答2:

Does this work for you?

df.iloc[:N, :].to_csv()

df.iloc[P:Q, :].to_csv()

I believe df.iloc generally produces references to the original dataframe rather than copying the data.

If this still doesn't work, you might also try setting the chunksize in the to_csv call. It may be that pandas is able to create the subset without using much more memory, but then it makes a complete copy of the rows written to each chunk. If the chunksize is the whole frame, you would end up copying the whole frame at that point and running out of memory.

If all else fails, you can loop through df.iterrows() or df.iloc[P:Q, :].iterrows() or df.iloc[P:Q, :].itertuples() and write each row using the csv module (possibly writer.writerows(df.iloc[P:Q, :].itertuples()).

回答3:

Maybe you can select the rows index that you want to write on your CSV file like this:

df[df.index.isin([1, 2, ...])].to_csv('file.csv')

Or use this one:

df.loc[n:n].to_csv('file.csv')

来源：https://stackoverflow.com/questions/57458771/write-only-first-n-rows-from-pandas-df-to-csv

标签

python

pandas

csv