HDF5 Storage Overhead

删除回忆录丶 提交于 2019-12-04 06:33:35

I'll answer my own question. The overhead involved just in representing the group structure is enough that it doesn't make sense to store small arrays, or to have many groups, each containing only a small amount of data. There does not seem to be any way to reduce the overhead per group, which I measured at about 2.2 kB.

I resolved this issue by combining the two datasets in each subgroup into a (100 x 5) dataset. Then, I eliminated the subgroups, and combined all of the datasets in each group into a 3D dataset. Thus, if I had N subgroups previously, I now have one dataset in each group, with shape (N x 100 x 5). I thus save the N * 2.2 kB overhead that was previously present. Moreover, since HDF5's built-in compression is more effective with larger arrays, I now get a better than 1:1 overall packing ratio, whereas before, overhead took up half the space of the file, and compression was completely ineffective.

The lesson is to avoid complicated group structures in HDF5 files, and to try to combine as much data as possible into each dataset.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!