How to estimate how much memory a Pandas' DataFrame will need?

前端 未结 7 598
谎友^
谎友^ 2020-11-30 18:49

I have been wondering... If I am reading, say, a 400MB csv file into a pandas dataframe (using read_csv or read_table), is there any way to guesstimate how much memory this

相关标签:
7条回答
  • 2020-11-30 19:36

    You have to do this in reverse.

    In [4]: DataFrame(randn(1000000,20)).to_csv('test.csv')
    
    In [5]: !ls -ltr test.csv
    -rw-rw-r-- 1 users 399508276 Aug  6 16:55 test.csv
    

    Technically memory is about this (which includes the indexes)

    In [16]: df.values.nbytes + df.index.nbytes + df.columns.nbytes
    Out[16]: 168000160
    

    So 168MB in memory with a 400MB file, 1M rows of 20 float columns

    DataFrame(randn(1000000,20)).to_hdf('test.h5','df')
    
    !ls -ltr test.h5
    -rw-rw-r-- 1 users 168073944 Aug  6 16:57 test.h5
    

    MUCH more compact when written as a binary HDF5 file

    In [12]: DataFrame(randn(1000000,20)).to_hdf('test.h5','df',complevel=9,complib='blosc')
    
    In [13]: !ls -ltr test.h5
    -rw-rw-r-- 1 users 154727012 Aug  6 16:58 test.h5
    

    The data was random, so compression doesn't help too much

    0 讨论(0)
提交回复
热议问题