Python pandas Reading specific values from HDF5 files using read_hdf and HDFStore.select

我怕爱的太早我们不能终老 提交于 2019-12-06 12:13:07

问题


So I created hdf5 file with a simple dataset that looks like this

>>> pd.read_hdf('STORAGE2.h5', 'table')
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

Using this script

import pandas as pd
import scipy as sp
from pandas.io.pytables import Term

store = pd.HDFStore('STORAGE2.h5')

df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))

df_tl.to_hdf('STORAGE2.h5','table',append=True)

I know I can select columns using

x = pd.read_hdf('STORAGE2.h5', 'table',  columns=['A'])

or

x = store.select('table', where = 'columns=A')

How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3] or df[df["A"]=='foo']

Also does it make a difference in efficiency if I use read_hdf() or store.select()?


回答1:


You need to specify data_columns= (you can use True as well to make all columns searchable)

(FYI, the mode='w' will start the file over, and is just for my example)

In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])

In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]: 
   A  B
3  3  3
4  4  4


来源:https://stackoverflow.com/questions/26302480/python-pandas-reading-specific-values-from-hdf5-files-using-read-hdf-and-hdfstor

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!