How to estimate how much memory a Pandas' DataFrame will need?

前端未结

关注

 7  610

谎友^ 2020-11-30 18:49

I have been wondering... If I am reading, say, a 400MB csv file into a pandas dataframe (using read_csv or read_table), is there any way to guesstimate how much memory this

7条回答

不知归路 (楼主)

2020-11-30 19:19
If you know the dtypes of your array then you can directly compute the number of bytes that it will take to store your data + some for the Python objects themselves. A useful attribute of numpy arrays is nbytes. You can get the number of bytes from the arrays in a pandas DataFrame by doing
```
nbytes = sum(block.values.nbytes for block in df.blocks.values())
```
object dtype arrays store 8 bytes per object (object dtype arrays store a pointer to an opaque PyObject), so if you have strings in your csv you need to take into account that read_csv will turn those into object dtype arrays and adjust your calculations accordingly.

EDIT:

See the numpy scalar types page for more details on the object dtype. Since only a reference is stored you need to take into account the size of the object in the array as well. As that page says, object arrays are somewhat similar to Python list objects.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...