What is the most memory efficient way to combine read_sorted and Expr in pytables?

白昼怎懂夜的黑 提交于 2020-01-05 11:03:09

问题


I am looking for the most memory efficient way to combine reading a Pytables table (columns: x,y,z) in a sorted order(z column has a CSI) and evaluating an expression like

x+a*y+b*z

where a and b are constant. Up until now my only solution was to copy the entire table with the "sortyby=z" flag and then evaluating the expression piece-wise on the table.

Note: I want to keep the result x+a*y+b*z in memory to do some reduction operations on it which are not available directly in Pytables and then save it into a new Pytables table.


回答1:


There are two basic options, depending on if you need to iterate in a sorted fashion or not.

If you need to iterate over the table in a sorted table, then the reading in will be much more expensive than computing the expression. Thus you should efficiently read in using Table.read_sorted() and compute this expression in a list comprehension, or similar:

a = [row['x']+a*row['y']+b*row['z'] for row in 
     tab.read_sorted('z', checkCSI=True)]

If you don't need to iterate in a sorted manner (which it doesn't look like you do), you should set up and evaluate the expression using the Expr class, read in the CSI from the column, and apply this to expression results. This would look something like:

x = tab.cols.x
y = tab.cols.y
z = tab.cols.z
expr = tb.Expr('x+a*y+b*z')
unsorted_res = expr.eval()
idx = z.read_indices()
sorted_res = unsored_res[idx]


来源:https://stackoverflow.com/questions/21750690/what-is-the-most-memory-efficient-way-to-combine-read-sorted-and-expr-in-pytable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!