Improve pandas (PyTables?) HDF5 table write performance

前端未结

关注

 2  1885

情深已故 2020-12-12 16:29

I\'ve been using pandas for research now for about two months to great effect. With large numbers of medium-sized trace event datasets, pandas + PyTables (the HDF5 interface

2条回答

孤街浪徒 (楼主)

2020-12-12 17:22

That's an interesting discussion. I think Peter is getting awesome performance for the Fixed format because the format writes in a single shot and also that he has a really good SSD (it can write at more than 450 MB/s).

Appending to table is a more complex operation (the dataset has to be enlarged, and new records must be checked so that we can ensure that they follow the schema of the table). This is why appending rows in tables is generally slower (but still, Jeff is getting ~ 70 MB/s, which is pretty good). That Jeff is getting more speed than Peter is probably due to the fact that he has a better processor.

Finally, indexing in PyTables uses a single processor, yes, and that normally is an expensive operation, so you should really disable it if you are not going to query data on-disk.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...