Combining hdf5 files

后端 未结 6 801
离开以前
离开以前 2020-12-08 04:26

I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file contain

6条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-08 05:06

    One solution is to use the h5py interface to the low-level H5Ocopy function of the HDF5 API, in particular the h5py.h5o.copy function:

    In [1]: import h5py as h5
    
    In [2]: hf1 = h5.File("f1.h5")
    
    In [3]: hf2 = h5.File("f2.h5")
    
    In [4]: hf1.create_dataset("val", data=35)
    Out[4]: 
    
    In [5]: hf1.create_group("g1")
    Out[5]: 
    
    In [6]: hf1.get("g1").create_dataset("val2", data="Thing")
    Out[6]: 
    
    In [7]: hf1.flush()
    
    In [8]: h5.h5o.copy(hf1.id, "g1", hf2.id, "newg1")
    
    In [9]: h5.h5o.copy(hf1.id, "val", hf2.id, "newval")
    
    In [10]: hf2.values()
    Out[10]: [, ]
    
    In [11]: hf2.get("newval").value
    Out[11]: 35
    
    In [12]: hf2.get("newg1").values()
    Out[12]: []
    
    In [13]: hf2.get("newg1").get("val2").value
    Out[13]: 'Thing'
    

    The above was generated with h5py version 2.0.1-2+b1 and iPython version 0.13.1-2+deb7u1 atop Python version 2.7.3-4+deb7u1 from a more-or-less vanilla install of Debian Wheezy. The files f1.h5 and f2.h5 did not exist prior to executing the above. Note that, per salotz, for Python 3 the dataset/group names need to be bytes (e.g., b"val"), not str.

    The hf1.flush() in command [7] is crucial, as the low-level interface apparently will always draw from the version of the .h5 file stored on disk, not that cached in memory. Copying datasets to/from groups not at the root of a File can be achieved by supplying the ID of that group using, e.g., hf1.get("g1").id.

    Note that h5py.h5o.copy will fail with an exception (no clobber) if an object of the indicated name already exists in the destination location.

提交回复
热议问题