Pandas memory error

前端 未结 2 1060
走了就别回头了
走了就别回头了 2020-12-19 20:50

I have a csv file with ~50,000 rows and 300 columns. Performing the following operation is causing a memory error in Pandas (python):

merged_df.stack(0).rese         


        
相关标签:
2条回答
  • 2020-12-19 21:29

    So it takes on my 64-bit linux (32GB) memory, a little less than 2GB.

    In [5]: def f():
           df = DataFrame(np.random.randn(50000,300))
           df.stack().reset_index(1)
    
    
    In [6]: %memit f()
    maximum of 1: 1791.054688 MB per loop
    

    Since you didn't specify. This won't work on 32-bit at all (as you can't usually allocate a 2GB contiguous block), but should work if you have reasonable swap / memory.

    0 讨论(0)
  • 2020-12-19 21:36

    As an alternative approach you can use the library "dask"
    e.g:

    # Dataframes implement the Pandas API
    import dask.dataframe as dd`<br>
    df = dd.read_csv('s3://.../2018-*-*.csv')
    
    0 讨论(0)
提交回复
热议问题