问题
How can I write only first N rows or from P to Q rows to csv from pandas dataframe without subseting the df first? I cannot subset the data I want to export because of memory issues.
I am thinking of a function which writes to csv row by row.
Thank you
回答1:
- Use head- Return the first n rows.
Ex.
import pandas as pd
import numpy as np
date = pd.date_range('20190101',periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=date, columns=list('ABCD'))
#wtire only top two rows into csv file
print(df.head(2).to_csv("test.csv"))
回答2:
Does this work for you?
df.iloc[:N, :].to_csv()
Or
df.iloc[P:Q, :].to_csv()
I believe df.iloc
generally produces references to the original dataframe rather than copying the data.
If this still doesn't work, you might also try setting the chunksize
in the to_csv
call. It may be that pandas is able to create the subset without using much more memory, but then it makes a complete copy of the rows written to each chunk. If the chunksize is the whole frame, you would end up copying the whole frame at that point and running out of memory.
If all else fails, you can loop through df.iterrows()
or df.iloc[P:Q, :].iterrows()
or df.iloc[P:Q, :].itertuples()
and write each row using the csv
module (possibly writer.writerows(df.iloc[P:Q, :].itertuples()
).
回答3:
Maybe you can select the rows index that you want to write on your CSV file like this:
df[df.index.isin([1, 2, ...])].to_csv('file.csv')
Or use this one:
df.loc[n:n].to_csv('file.csv')
来源:https://stackoverflow.com/questions/57458771/write-only-first-n-rows-from-pandas-df-to-csv