is it possible to use np arrays as indices in h5py datasets?

守給你的承諾、 提交于 2020-01-11 14:07:09

问题


I need to merge a number of datasets, each contained in a separate file, into another dataset belonging to a final file. The order of the data in the partial dataset is not preserved when they get copied in the final one - the data in the partial datasets is 'mapped' into the final one through indices. I created two lists, final_indices and partial_indices, and wrote:

final_dataset   = final_hdf5file['dataset']
partial_dataset = partial_hdf5file['dataset']

# here partial ad final_indices are lists.
final_dataset[final_indices] = partial_dataset[partial_indices] 

the problem with this is that the performance is quite bad - and the reason is that final_ and partial_indices have both to be lists. my workaround has been to create two np arrays from the final and partial datasets, and use np arrays as indices.

final_array   = np.array(final_dataset)
partial_array = np.array(partial_dataset)
# here partial ad final_indices are nd arrays.
final_array[final_indices] = partial_array[partial_indices] 

The final array is then re-written to the final dataset.

final_dataset[...] = final_array

However, it seems to me rather inelegant to do so.

Is it possible to use np.arrays as indices in a h5py dataset?


回答1:


So you are doing fancy-indexing for both the read and write:

http://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing

It warns that it can be slow with long lists.

I can see where reading and writing the whole sets, and doing the mapping on arrays will be faster, though I haven't actually tested that. The read/writing is faster, as is the mapping

http://docs.h5py.org/en/latest/high/dataset.html#reading-writing-data

I would use the slice notation (or value) to load the datasets, but that's a minor point.

final_array   = final_dataset[:]

Hide the code in a function if it looks inelegant.

This oneliner might work (I haven't tested it). The RHS is more likely to work.

final_dataset[:][final_indices] = partial_dataset[:][partial_indices] 


来源:https://stackoverflow.com/questions/47888392/is-it-possible-to-use-np-arrays-as-indices-in-h5py-datasets

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!