Reading a part of csv file

十年热恋 提交于 2021-01-02 07:55:40

问题


I have a really large csv file about 10GB. When ever I try to read in into iPython notebook using

data = pd.read_csv("data.csv")  

my laptop gets stuck. Is it possible to just read like 10,000 rows or 500 MB of a csv file.


回答1:


It is possible. You can create an iterator yielding chunks of your csv of a certain size at a time as a DataFrame by passing iterator=True with your desired chunksize to read_csv.

df_iter = pd.read_csv('data.csv', chunksize=10000, iterator=True)

for iter_num, chunk in enumerate(df_iter, 1):
    print(f'Processing iteration {iter_num}')
    # do things with chunk

Or more briefly

for chunk in pd.read_csv('data.csv', chunksize=10000):
    # do things with chunk

Alternatively if there was just a specific part of the csv you wanted to read, you could use the skiprows and nrows options to start at a particular line and subsequently read n rows, as the naming suggests.




回答2:


Likely a memory issue. On read_csv you can set chunksize (where you can specify number of rows).

Alternatively, if you don't need all the columns, you can change usecols on read_csv to import only the columns you need.



来源:https://stackoverflow.com/questions/46355419/reading-a-part-of-csv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!