h5py: Compound datatypes and scale-offset in the compression pipeline

混江龙づ霸主 提交于 2019-12-24 09:29:44

问题


Using Numpy and h5py, it is possible to create ‘compound datatype’ datasets to be stored in an hdf5-file:

import h5py
import numpy as np
#
# Create a new file using default properties.
#
file = h5py.File('compound.h5','w')
#
# Create a dataset under the Root group.
#
comp_type = np.dtype([('fieldA', 'i4'), ('fieldB', 'f4')])
dataset = file.create_dataset("comp", (4,), comp_type)

It is also possible to use various compression filters in a ‘compression pipeline’, among them the ‘scale-offset’ filter:

cmpr_dataset = file.create_dataset("cmpr", (4,), 'i4', scaleoffset=0)

However, it is not clear to me whether and then how it is possible to specify the scale offset filter with specific parameter (e.g., the 0 in the above example) for the different fields of a compound datatype.

More generally, it is not clear to me whether and how any filter can be applied with field-specific parameters.

So, the question are:

  • Is it possible to apply filters to compound datatype datasets only, or with specific parameters, to a specific field?

  • If yes, how can this be done, syntax-wise?

My guess (fear) is that the nature of how the compound data is stored (in one ‘column’, instead of each field in its own ‘column’) will prohibit application of such field-specific filters, but I wanted to check, just to be sure.


回答1:


Besides the h5py docs, look at the hdf5 docs. They go into more detail. If the underlying file system does not support this, then the numpy interface won't either.

https://support.hdfgroup.org/HDF5/doc/UG/OldHtmlSource/10_Datasets.html#ScaleOffset

Elsewhere it says filters are applied to whole chunks.

The expression defining the compound type is pure numpy. h5py must be translating its descriptor into an equivalent hdf5 c-struc description. There are sample c and fortran compound types definitions.

All docs say that this offset applies only to integer and float types. That can be understood as excluding string, vlen, and compound. What you are hoping is that it would still work with the numeric types inside a compound type. I don't think so.



来源:https://stackoverflow.com/questions/40784482/h5py-compound-datatypes-and-scale-offset-in-the-compression-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!