Combining hdf5 files

后端未结

关注

 6  809

离开以前 2020-12-08 04:26

I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file contain

6条回答

野趣味 (楼主)

2020-12-08 05:06
One solution is to use the h5py interface to the low-level H5Ocopy function of the HDF5 API, in particular the h5py.h5o.copy function:
```
In [1]: import h5py as h5

In [2]: hf1 = h5.File("f1.h5")

In [3]: hf2 = h5.File("f2.h5")

In [4]: hf1.create_dataset("val", data=35)
Out[4]: 

In [5]: hf1.create_group("g1")
Out[5]: 

In [6]: hf1.get("g1").create_dataset("val2", data="Thing")
Out[6]: 

In [7]: hf1.flush()

In [8]: h5.h5o.copy(hf1.id, "g1", hf2.id, "newg1")

In [9]: h5.h5o.copy(hf1.id, "val", hf2.id, "newval")

In [10]: hf2.values()
Out[10]: [, ]

In [11]: hf2.get("newval").value
Out[11]: 35

In [12]: hf2.get("newg1").values()
Out[12]: []

In [13]: hf2.get("newg1").get("val2").value
Out[13]: 'Thing'
```
The above was generated with h5py version 2.0.1-2+b1 and iPython version 0.13.1-2+deb7u1 atop Python version 2.7.3-4+deb7u1 from a more-or-less vanilla install of Debian Wheezy. The files f1.h5 and f2.h5 did not exist prior to executing the above. Note that, per salotz, for Python 3 the dataset/group names need to be bytes (e.g., b"val"), not str.

The hf1.flush() in command [7] is crucial, as the low-level interface apparently will always draw from the version of the .h5 file stored on disk, not that cached in memory. Copying datasets to/from groups not at the root of a File can be achieved by supplying the ID of that group using, e.g., hf1.get("g1").id.

Note that h5py.h5o.copy will fail with an exception (no clobber) if an object of the indicated name already exists in the destination location.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...