How to cut up my dataframe in chunks, but keeping groups together

后端未结

关注

 2  825

-上瘾入骨i 2021-01-29 08:01

I currently have a massive set of datasets. I have a set for each year in the 2000\'s. I take a combination of three years and run a code on that to clean. The problem is that d

2条回答

忘掉有多难 (楼主)

2021-01-29 08:26
Use chunked pandas by importing Blaze. Instructions from http://blaze.readthedocs.org/en/latest/ooc.html

Naive use of Blaze triggers out-of-core systems automatically when called on large files.
```
d = Data('my-small-file.csv')  
d.my_column.count()  # Uses Pandas  

d = Data('my-large-file.csv')  
d.my_column.count()  # Uses Chunked Pandas  
```
How does it work? Blaze breaks up the data resource into a sequence of chunks. It pulls one chunk into memory, operates on it, pulls in the next, etc.. After all chunks are processed it often has to finalize the computation with another operation on the intermediate results.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...