H5PY Writes Very Slow

£可爱£侵袭症+ 提交于 2019-12-12 14:27:49

问题


I have a h5py dataset like below. I want to index the records by string instead of by numeric value. So, e.g. I would be able to get the value of the first record by dset[dset.attrs['id1']].

I am trying to write the attributes with the code below, but it is extremely slow. If I do a %timeit dset.attrs[rid] = idx in the loop a single write is about 310ms. The strings I am writing are 36 characters. I have about 100k records I need to write, which would take about 9 hours. Something must be terribly wrong? Also the CPU is pegged.

ids = ['id1', 'id2', 'id3']
h5 = h5py.File("/tmp/ds.h5", "w")
dset = h5.create_dataset("lds", (100000, ), dtype='float32')

for idx, id in enumerate(ids): # loop takes forever
    dset.attrs[id] = idx # takes about ~310ms

EDIT

Minimal "working" example.

for idx, rid in enumerate(range(10)):
    %timeit dset.attrs[str(rid)] = idx

10 loops, best of 3: 470 ms per loop
10 loops, best of 3: 470 ms per loop
...

Nearly 0.5 second for a single write.


回答1:


Use the latest value for parameter libver. This is a lot faster. So, e.g.

h5py.File('ds.h5', 'w', libver='latest')

See here: https://github.com/h5py/h5py/issues/705



来源:https://stackoverflow.com/questions/36879459/h5py-writes-very-slow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!