Resizing and storing dataset in .h5 format using h5py in python

后端 未结 2 1702
逝去的感伤
逝去的感伤 2020-12-21 11:32

I am trying to resize dataset and store new values using h5py package in python. My dataset size keeps increasing at every time instance, and I would like to ap

2条回答
  •  北海茫月
    2020-12-21 11:34

    The problem

    Not sure about the rest of your code, but you can't use the context manager pattern (ie with h5py.File(foo) as bar:) within a function that returns a dataset. As you point out in the comment under your question, this means that by the time you try to access the dataset the actual HDF5 file will have already closed. The dataset objects in h5py are like live views into the file, so they require the file remain open in order to use them. Thus, you're getting errors.

    A solution

    It's a good idea to always interact with files within a managed context (ie within a with clause). If your code throws an error, the context manager will (almost always) ensure that the file is closed. This helps avoid any potential losses of data resulting from a crash.

    In your case, you can have your cake (encapsulate your dataset creation routines in a separate function) and eat it too (interact with the HDF5 file within a managed context) by writing your own context manager to look after the file for you.

    It's actually pretty simple to code. Any Python object that implements the __enter__ and __exit__ methods is a valid context manager. Here's a complete working version:

    import os
    import h5py
    import numpy as np
    
    path = './out.h5'
    try:
        os.remove(path)
    except OSError: 
        pass
    
    class H5PYManager:
        def __init__(self, path, method='a'):
            self.hf = h5py.File(path, method)
    
        def __enter__(self):
            # when you call `with H5PYManager(foo) as bar`, the return of this method will be assigned to `bar`
            return self.create_datasets()
    
        def __exit__(self, type, value, traceback):
            # this method gets called when you exit the `with` clause, including when an error is raised
            self.hf.close()    
    
        def create_datasets(self):
            grp = self.hf.create_group('left')
            return [grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)),
                    grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3))]
    
    if __name__ == '__main__':
        with H5PYManager(path) as dset:
            for i in range(3):
                if i == 0:
                    dset[0][:] = np.random.random(dset[0].shape) 
                    dset[1][:] = np.random.random(dset[1].shape)
                else:
                    dset[0].resize(dset[0].shape[0]+10**4, axis=0)
                    dset[0][-10**4:] = np.random.random((10**4,3))
                    dset[1].resize(dset[1].shape[0]+10**4, axis=0)
                    dset[1][-10**4:] = np.random.random((10**4,3))
    

提交回复
热议问题