save multiple pd.DataFrames with hierarchy to hdf5

◇◆丶佛笑我妖孽 提交于 2021-02-07 09:48:30

问题


I have multiple pd.DataFrames which have hierarchical organization. Let's say I have:

day_temperature_london_df = pd.DataFrame(...)
night_temperature_london_df = pd.DataFrame(...)

day_temperature_paris_df = pd.DataFrame(...)
night_temperature_paris_df = pd.DataFrame(...)

And I want to group them into hdf5 file so two of them go to group 'london' and two of others go to 'paris'.

If I use h5py I lose the format of the pd.DataFrame, lose indexes and columns.

f = h5py.File("temperature.h5", "w")
grp_london = f.create_group("london")
day_lon_dset = grp_london.create_dataset("day", data=day_temperature_london_df)
print day_lon_dset[...]

This gives me a numpy array. Is there a way to store many dataframes with hierarchy in the same way .to_hdf does - it keeps all the properties of the dataframe?


回答1:


I'm more familiar with numpy and h5py than pandas. But I was able to create:

In [85]: store = pd.HDFStore('store.h5')
In [86]: store.root
Out[86]: 
/ (RootGroup) ''
  children := []
In [87]: store['df1']=df1
In [88]: store['group/df1']=df1
In [89]: store['group/df2']=df2

which can be reloaded and viewed:

In [95]: store
Out[95]: 
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df1                  frame        (shape->[3,4])
/group/df1            frame        (shape->[3,4])
/group/df2            frame        (shape->[5,6])

In [96]: store.root
Out[96]: 
/ (RootGroup) ''
  children := ['df1' (Group), 'group' (Group)]

store._handle shows the file structure in detail.

In a shell I can also look at the file with:

1431:~/mypy$ h5dump store.h5 |less

Following:

how should i use h5py lib for storing time series data

In [4]: f1 = h5py.File('store.h5')
In [5]: list(f1.keys())
Out[5]: ['df1', 'group']
In [6]: list(f1['df1'].keys())
Out[6]: ['axis0', 'axis1', 'block0_items', 'block0_values']

In [10]: list(f1['group'].keys())
Out[10]: ['df1', 'df2']
In [11]: list(f1['group/df1'].keys())
Out[11]: ['axis0', 'axis1', 'block0_items', 'block0_values']
In [12]: list(f1['group/df2'].keys())
Out[12]: ['axis0', 'axis1', 'block0_items', 'block0_values']

So the `group/df2' key is equivalent to a hierarchy of groups:

In [13]: gp = f1['group']
In [15]: gp['df2']['axis0']
Out[15]: <HDF5 dataset "axis0": shape (6,), type "<i8">
[17]: f1['group/df2/axis0']
Out[17]: <HDF5 dataset "axis0": shape (6,), type "<i8">

We'd have to dig more into the docs or code of HDFStore or Pytables to see if they have an equivalent of create_group.




回答2:


I am going to combine answers, comments and what I found on other pages into this answer.

So yes indeed h5py is no required in my case. Groups can be created by

import pandas as pd 
s = pd.HDFStore(test.h5')
s['london/day'] = day_temperature_london_df
s['london/night'] = night_temperature_london_df

And each DataFrame can be accessed by:

pd.read_hdf('test.h5', 'london/day')

But then it is not clear how to read just one group. This can be done by looping though one node like:

s = pd.HDFStore('test.h5')
[s.select(node._v_pathname) for node in s.get_node('london')]

In this case each element of the list become a DataFrame for node 'london'

Structure of the file can be seen by calling s

<class 'pandas.io.pytables.HDFStore'>
File path: store_5.h5
/london/day              frame        (shape->[100,2])
/london/night            frame        (shape->[200,1])

So this way you should be able to create multiple levels with DataFrames and be able to read them back without losing the column, index etc.



来源:https://stackoverflow.com/questions/48172863/save-multiple-pd-dataframes-with-hierarchy-to-hdf5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!