Writing to compound dataset with variable length string via h5py (HDF5)

社会主义新天地 提交于 2019-12-18 09:29:12

问题


I've been able to create a compound dataset consisting of an unsigned int and a variable-length string in my HDF5 file using h5py, but I can't write to it.

dt = h5py.special_dtype(vlen=str)
dset = fout.create_dataset(ver, (1,), dtype=np.dtype([("time", np.uint64),("value", dt)]))

I've written to other compound datasets fairly easily, by setting the specific column(s) of the compound dataset as equal to an existing numpy array.

Now where I run into trouble is with writing to the compound dataset with a variable length string. Numpy does not support a variable length string, so I can't create the numpy array before hand that would contain the value.

My next thought was to write the individual value to the column in question, and this works for the unsigned int. When I try to write a string to the variable-lenght string field in the compound dataset though, I get:

    dset["value"] = str("blah")
  File "D:\Anaconda3\lib\site-packages\h5py\_hl\dataset.py", line 508, in __setitem__
    val = val.astype(numpy.dtype([(names[0], dtype)]))
ValueError: Setting void-array with object members using buffer.

Any guidance would be much appreciated.


回答1:


Following on my earlier answer to Inexplicable behavior when using vlen with h5py

I ran this test (h5py version '2.2.1'):

In [4]: import h5py
In [5]: dt = h5py.special_dtype(vlen=str)
In [6]: f=h5py.File('foo.hdf5')
In [8]: ds1 = f.create_dataset('JustStrings',(10,), dtype=dt)
In [10]: ds1[0]='string'
In [11]: ds1[1]='a longer string'
In [13]: ds1[2:5]='one_string two_strings three'.split()

In [14]: ds1
Out[14]: <HDF5 dataset "JustStrings": shape (10,), type "|O4">

In [15]: ds1.value
Out[15]: 
array(['string', 'a longer string', 'one_string', 'two_strings', 'three',
       '', '', '', '', ''], dtype=object)

And for a mixed dtype like yours:

In [16]: ds2 = f.create_dataset('IntandStrings',(10,),
   dtype=np.dtype([("number",int),('astring',dt)]))
In [17]: ds2[0]=(1,'astring')
In [18]: ds2[1]=(10,'a longer string')
In [19]: ds2[2:4]=[(10,'a longer much string'),(0,'')]
In [20]: ds2.value
Out[20]: 
array([(1, 'astring'), (10, 'a longer string'),
       (10, 'a longer much string'), (0, ''), (0, ''), (0, ''), (0, ''),
       (0, ''), (0, ''), (0, '')], 
      dtype=[('number', '<i4'), ('astring', 'O')])

Trying to set a field by itself does not seem to work

ds2['astring'][4]='one two three four'

Instead I have to set the whole record:

ds2[4]=(123,'one two three four')

Trying to set the whole field produces the same error:

ds2['astring']='astring'

I initialed this dataset to (10,), while yours is (1,). But I think it's the same problem.

I can, though, set the whole numeric field:

In [48]: ds2['number']=np.arange(10)
In [50]: ds2['number']
Out[50]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [51]: ds2.value
Out[51]: 
array([(0, 'astring'), (1, 'a longer string'), 
       (2, 'a longer much string'),
       (3, ''), (4, 'one two three four'), (5, ''), 
       (6, ''), (7, ''),
       (8, ''), (9, '')], 
      dtype=[('number', '<i4'), ('astring', 'O')])


来源:https://stackoverflow.com/questions/33247432/writing-to-compound-dataset-with-variable-length-string-via-h5py-hdf5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!