Sparse array support in HDF5

一笑奈何 提交于 2019-12-09 09:07:07

问题


I need to store a 512^3 array on disk in some way and I'm currently using HDF5. Since the array is sparse a lot of disk space gets wasted.

Does HDF5 provide any support for sparse array ?


回答1:


Chunked datasets (H5D_CHUNKED) allow sparse storage but depending on your data, the overhead may be important.

Take a typical array and try both sparse and non-sparse and then compare the file sizes, then you will see if it is really worth.




回答2:


One workaround is to create the dataset with a compression option. For example, in Python using h5py:

import h5py
f = h5py.File('my.h5', 'w')
d = f.create_dataset('a', dtype='f', shape=(512, 512, 512), fillvalue=-999.,
                     compression='gzip', compression_opts=9)
d[3, 4, 5] = 6
f.close()

The resulting file is 4.5 KB. Without compression, this same file would be about 512 MB. That's a compression of 99.999%, because most of the data are -999. (or whatever fillvalue you want).


The equivalent can be achieved using the C++ HDF5 API by setting H5::DSetCreatPropList::setDeflate to 9, with an example shown in h5group.cpp.




回答3:


HDF5 provides indexed storage: http://www.hdfgroup.org/HDF5/doc/TechNotes/RawDStorage.html



来源:https://stackoverflow.com/questions/3545349/sparse-array-support-in-hdf5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!