Experience with using h5py to do analytical work on big data in Python?

后端 未结 2 1919
执念已碎
执念已碎 2021-01-29 23:23

I do a lot of statistical work and use Python as my main language. Some of the data sets I work with though can take 20GB of memory, which makes operating on them using in-memor

2条回答
  •  天命终不由人
    2021-01-30 00:17

    This is a long comment, not an answer to your actual question about h5py. I don't use Python for stats and tend to deal with relatively small datasets, but it might be worth a moment to check out the CRAN Task View for high-performance computing in R, especially the "Large memory and out-of-memory data" section.

    Three reasons:

    • you can mine the source code of any of those packages for ideas that might help you generally
    • you might find the package names useful in searching for Python equivalents; a lot of R users are Python users, too
    • under some circumstances, it might prove convenient to just link to R for a particular analysis using one of the above-linked packages and then draw the results back into Python

    Again, I emphasize that this is all way out of my league, and it's certainly possible that you might already know all of this. But perhaps this will prove useful to you or someone working on the same problems.

提交回复
热议问题