pandas read_hdf with 'where' condition limitation?

大憨熊 提交于 2020-01-02 16:18:10

问题


I need to query an HDF5 file with where clause with 3 conditions, one of the condition is a list with a length of 30:

myList = list(xrange(30))

h5DF   = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString')

The query above gives me ValueError: too many inputs and the error is reproducible.

If I reduce length of the list to 29 (three conditions):

myList = list(xrange(29))

h5DF   = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString')

OR number of conditions to only two (list length of 30):

then it executes fine:

myList = list(xrange(30))

h5DF   = pd.read_hdf(h5Filename, 'df', where='index=myList & time=timeString')

Is this a known limitation? pandas documentation at http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.pytables.read_hdf.html doesn't mention about this limitation and seems like after searching this forum nobody encounter this limitation yet.

Version is pandas 0.15.2. Any help is appreciated.


回答1:


This is answered here

This is a defect in that numpy/numexpr cannot handle more than 31 operands in the tree. An expression like foo=[1,2,3,4] in the where of the HDFStore generates an expression like (foo==1) | (foo==2) .... so these are expanded and if you have too many can fail.

HDFStore handles this with a single operand (IOW if you just have foo=[range(31)] is ok, but because you happen to have a nested sub-expression where the sub-nodes themselves are too long it errors.

Generally a better way to do this is to select a bigger range (e.g. maybe the end-points of the selection for each operand), then do an in-memory .isin. It might even be faster, because HDF5 tends to be more efficient IMHO when selecting larger ranges (even though you are bringing more data to memory), rather than individual selections.



来源:https://stackoverflow.com/questions/28754265/pandas-read-hdf-with-where-condition-limitation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!