问题
I'm attempting to fill an h5py dataset with a series of numpy arrays that I generate in sequence so my memory can handle it.
The h5py array is initialised so that the first dimension can have any magnitude,
f.create_dataset('x-data', (1, maxlen, 50), maxshape=(None, maxlen, 50))
After generating each numpy array X, I am using
f['x-data'][alen:alen + len(data),:,:] = X
Where for example, in the first array, alen=0 and len(data)=10056. I then increment alen so the next array will start from where the last one ended.
print f['x-data'][alen:alen + len(data),:,:].shape, alen, len(data)
(1L, 60L, 50L) 0 10056
Does anyone know why the 0:10056 indexing is being interpreted as 1L?
回答1:
I replicated your example, but on a much smaller scale. I had to do a resize each time I added elements, e.g.
f['xdata'].resize(50,axis=0)
The first time I tried to add a block I got an error:
TypeError: Can't broadcast (10, 20, 10) -> (1, 20, 10)
But subsequent times, when I'd outgrown the allocated space, it failed silently. No error, it just didn't end up storing the new values.
This is for version 2.2.1
回答2:
I found the answer from a helpful person on the user group.
The maxshape(None) feature does not mean that the dataset automatically resizes - it must be resized each time new input is added. So the first dimension must be increased before adding new data:
x.resize((x.shape[0] + X.shape[0], X.shape[1], X.shape[2]))
y.resize((y.shape[0] + Y.shape[0], Y.shape[1], Y.shape[2]))
The dataset then adds the values correctly.
来源:https://stackoverflow.com/questions/32258222/h5py-returning-unexpected-results-in-indexing