Combining hdf5 files

后端 未结 6 811
离开以前
离开以前 2020-12-08 04:26

I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file contain

6条回答
  •  清歌不尽
    2020-12-08 05:09

    I usually use ipython and h5copy tool togheter, this is much faster compared to a pure python solution. Once installed h5copy.

    Console solution M.W.E.

    #PLESE NOTE THIS IS IPYTHON CONSOLE CODE NOT PURE PYTHON
    
    import h5py
    #for every dataset Dn.h5 you want to merge to Output.h5 
    f = h5py.File('D1.h5','r+') #file to be merged 
    h5_keys = f.keys() #get the keys (You can remove the keys you don't use)
    f.close() #close the file
    for i in h5_keys:
            !h5copy -i 'D1.h5' -o 'Output.h5' -s {i} -d {i}
    

    Automated console solution

    To completely automatize the process supposing you are working in the folder were the files to be merged are stored:

    import os 
    d_names = os.listdir(os.getcwd())
    d_struct = {} #Here we will store the database structure
    for i in d_names:
       f = h5py.File(i,'r+')
       d_struct[i] = f.keys()
       f.close()
    
    # A) empty all the groups in the new .h5 file 
    for i in d_names:
        for j  in d_struct[i]:
            !h5copy -i '{i}' -o 'output.h5' -s {j} -d {j}
    

    Create a new group for every .h5 file added

    If you want to keep the previous dataset separate inside the output.h5, you have to create the group first using the flag -p:

     # B) Create a new group in the output.h5 file for every input.h5 file
     for i in d_names:
            dataset = d_struct[i][0]
            newgroup = '%s/%s' %(i[:-3],dataset)
            !h5copy -i '{i}' -o 'output.h5' -s {dataset} -d {newgroup} -p
            for j  in d_struct[i][1:]:
                newgroup = '%s/%s' %(i[:-3],j) 
                !h5copy -i '{i}' -o 'output.h5' -s {j} -d {newgroup}
    

提交回复
热议问题