I am trying to resize dataset and store new values using h5py
package in python. My dataset size keeps increasing at every time instance, and I would like to ap
Not sure about the rest of your code, but you can't use the context manager pattern (ie with h5py.File(foo) as bar:
) within a function that returns a dataset. As you point out in the comment under your question, this means that by the time you try to access the dataset the actual HDF5 file will have already closed. The dataset objects in h5py
are like live views into the file, so they require the file remain open in order to use them. Thus, you're getting errors.
It's a good idea to always interact with files within a managed context (ie within a with
clause). If your code throws an error, the context manager will (almost always) ensure that the file is closed. This helps avoid any potential losses of data resulting from a crash.
In your case, you can have your cake (encapsulate your dataset creation routines in a separate function) and eat it too (interact with the HDF5 file within a managed context) by writing your own context manager to look after the file for you.
It's actually pretty simple to code. Any Python object that implements the __enter__
and __exit__
methods is a valid context manager. Here's a complete working version:
import os
import h5py
import numpy as np
path = './out.h5'
try:
os.remove(path)
except OSError:
pass
class H5PYManager:
def __init__(self, path, method='a'):
self.hf = h5py.File(path, method)
def __enter__(self):
# when you call `with H5PYManager(foo) as bar`, the return of this method will be assigned to `bar`
return self.create_datasets()
def __exit__(self, type, value, traceback):
# this method gets called when you exit the `with` clause, including when an error is raised
self.hf.close()
def create_datasets(self):
grp = self.hf.create_group('left')
return [grp.create_dataset('voltage', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3)),
grp.create_dataset('current', (10**4,3), maxshape=(None,3), dtype='f', chunks=(10**4,3))]
if __name__ == '__main__':
with H5PYManager(path) as dset:
for i in range(3):
if i == 0:
dset[0][:] = np.random.random(dset[0].shape)
dset[1][:] = np.random.random(dset[1].shape)
else:
dset[0].resize(dset[0].shape[0]+10**4, axis=0)
dset[0][-10**4:] = np.random.random((10**4,3))
dset[1].resize(dset[1].shape[0]+10**4, axis=0)
dset[1][-10**4:] = np.random.random((10**4,3))