Most efficient way of saving a pandas dataframe or 2d numpy array into h5py, with each row a seperate key, using a column

问题

This is a follow up to this stackoverflow question

Column missing when trying to open hdf created by pandas in h5py

Where I am trying to create save a large amount of data onto a disk (too large to fit into memory), and retrieve sepecific rows of the data using indices.

One of the solutions given in the linked post is to create a seperate key for every every row.

At the moment I can only think of iterating through each row, and setting the keys directly.

For example, if this is my data

IndexID Ids
1899317 [0, 47715, 1757, 9, 38994, 230, 12, 241, 12228...
22861131    [0, 48156, 154, 6304, 43611, 11, 9496, 8982, 1...
2163410 [0, 26039, 41156, 227, 860, 3320, 6673, 260, 1...
15760716    [0, 40883, 4086, 11, 5, 18559, 1923, 1494, 4, ...
12244098    [0, 45651, 4128, 227, 5, 10397, 995, 731, 9, 3...

I can go throw say my dataframe and set each row like this

f.create_dataset(str(row['IndexID']), data=row['Ids'])

I am wondering if there is a batch way to do this.

来源：https://stackoverflow.com/questions/61706898/most-efficient-way-of-saving-a-pandas-dataframe-or-2d-numpy-array-into-h5py-wit

标签

python

pandas

HDFS

h5py

hdf

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!