How to create variable length columns in hdf5 file?

倖福魔咒の 提交于 2019-12-11 17:09:00

问题


I am using h5py package to create HDF5 file for my training set.

I want to create the first column having a variable length. For example, [1,2,3] as 1st entry in the column, [1,2,3,4,5] as 2nd entry and so on leaving other 5 columns in the same dataset in HDF5 file with data type int with a fixed length, i.e. 1.

I have tried the below code statement to solve this type of scenario:

dt = h5py.special_dtype(vlen=np.dtype('int32'))
datatype = np.dtype([('FieldA', dt), ('FieldB', dt1), ('FieldC', dt1), ('FieldD', dt1), ('FieldE', dt1), ('FieldF', dt1)])

But, in the output, I got only empty array for each of the columns stated above for this dataset.

And, when I tried the below code:

dt = h5py.special_dtype(vlen=np.dtype('int32'))
data = db.create_dataset("data1", (5000,), dtype=dt)

This only gives me one column with variable length entries in the dataset but I want all these 6 columns to be included in the same dataset but with 1st column as having variable length entries like stated above.

I am totally confused as to how to get a solution for this type of scenario. Any help would highly be appreciated.


回答1:


Do you want variable length (ragged) columns, or just a column that can hold an array of data (up to the dtype limit)? The second is pretty straight forward. See the code below. (It's a simple example with 2 fields to demonstrate the method.)

my_dt = np.dtype([('FieldA', 'int32', (4,)), ('FieldB', 'int32') ] )


with h5py.File('SO_57260167.h5','w') as h5f :

    data = h5f.create_dataset("testdata", (10,), dtype=my_dt)

    for cnt in range(10) :
        arr = np.random.randint(1,1000,size=4)
        print (arr)
        data[cnt,'FieldA']=arr
        data[cnt,'FieldB']=arr[0]
        print (data[cnt]['FieldB'])

If you want a variable length ("ragged") column, I'm 99% sure you are limited to a single column when using the special dtype in a dataset. Also, I don't think you can name the fields/columns. (I couldn't get it to work, and couldn't find any examples.)
Code below shows example above modified to put variable column data in data set vl_data and the rest of the integer data in data set fx_data.

vl_dt = h5py.special_dtype(vlen=np.dtype('int32'))
my_dt = np.dtype([('FieldB', 'int32'), ('FieldC', 'int32'), ('FieldD', 'int32'), 
                  ('FieldE', 'int32'), ('FieldF', 'int32')])

with h5py.File('SO_57260167_vl.h5','w') as h5f :

    vl_data = h5f.create_dataset("testdata_vl", (10,), dtype= vl_dt)
    fx_data = h5f.create_dataset("testdata", (10,), dtype=my_dt )

    for cnt in range(10) :
        arr = np.random.randint(1,1000,size=cnt+2)
#        print (arr)
        vl_data[cnt]=arr
        print (vl_data[cnt])
        fx_data[cnt,'FieldB']=arr[0]
        fx_data[cnt,'FieldF']=arr[-1]
        print (fx_data[cnt])


来源:https://stackoverflow.com/questions/57260167/how-to-create-variable-length-columns-in-hdf5-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!