How can I read large text files in Python, line by line, without loading it into memory?

前端 未结 15 1482
臣服心动
臣服心动 2020-11-22 03:32

I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() bec

15条回答
  •  一整个雨季
    2020-11-22 04:07

    The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

    dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

    import dask.dataframe as dd
    
    df = dd.read_csv('filename.csv')
    df.head(10)  # return first 10 rows
    df.tail(10)  # return last 10 rows
    
    # iterate rows
    for idx, row in df.iterrows():
        ...
    
    # group by my_field and return mean
    df.groupby(df.my_field).value.mean().compute()
    
    # slice by column
    df[df.my_field=='XYZ'].compute()
    

提交回复
热议问题