What is the easiest way to load a filtered .tda file using pandas?

眉间皱痕 提交于 2019-12-12 20:58:09

问题


Pandas has the excellent .read_table() function, but huge files result in a MemoryError.
Since I only need to load the lines that satisfy a certain condition, I'm looking for a way to only load those.

This could be done using a temporary file:

with open(hugeTdaFile) as huge:
    with open(hugeTdaFile + ".partial.tmp", "w") as tmp:
        tmp.write(huge.readline())  # the header line
        for line in huge:
            if SomeCondition(line):
                tmp.write(line)

t = pandas.read_table(tmp.name)

Is there a way to avoid such a use of a temp file?


回答1:


you can use the chunksize parameter to return an iterator

see this: http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk

  • filter the chunk frames however you want
  • append the filtered to a list
  • concat at the end

(alternatively you could write them out to new csvs or HDFStores or whatever)



来源:https://stackoverflow.com/questions/15088190/what-is-the-easiest-way-to-load-a-filtered-tda-file-using-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!