Combining hdf5 files

后端未结

关注

 6  811

离开以前 2020-12-08 04:26

I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file contain

6条回答

清歌不尽 (楼主)

2020-12-08 05:09

I usually use ipython and h5copy tool togheter, this is much faster compared to a pure python solution. Once installed h5copy.

Console solution M.W.E.

#PLESE NOTE THIS IS IPYTHON CONSOLE CODE NOT PURE PYTHON

import h5py
#for every dataset Dn.h5 you want to merge to Output.h5 
f = h5py.File('D1.h5','r+') #file to be merged 
h5_keys = f.keys() #get the keys (You can remove the keys you don't use)
f.close() #close the file
for i in h5_keys:
        !h5copy -i 'D1.h5' -o 'Output.h5' -s {i} -d {i}

Automated console solution

To completely automatize the process supposing you are working in the folder were the files to be merged are stored:

import os 
d_names = os.listdir(os.getcwd())
d_struct = {} #Here we will store the database structure
for i in d_names:
   f = h5py.File(i,'r+')
   d_struct[i] = f.keys()
   f.close()

# A) empty all the groups in the new .h5 file 
for i in d_names:
    for j  in d_struct[i]:
        !h5copy -i '{i}' -o 'output.h5' -s {j} -d {j}

Create a new group for every .h5 file added

If you want to keep the previous dataset separate inside the output.h5, you have to create the group first using the flag -p:

 # B) Create a new group in the output.h5 file for every input.h5 file
 for i in d_names:
        dataset = d_struct[i][0]
        newgroup = '%s/%s' %(i[:-3],dataset)
        !h5copy -i '{i}' -o 'output.h5' -s {dataset} -d {newgroup} -p
        for j  in d_struct[i][1:]:
            newgroup = '%s/%s' %(i[:-3],j) 
            !h5copy -i '{i}' -o 'output.h5' -s {j} -d {newgroup}

0 讨论(0)

查看其它6个回答