Sorting in pandas for large datasets

前端 未结 5 1485
故里飘歌
故里飘歌 2020-12-08 11:21

I would like to sort my data by a given column, specifically p-values. However, the issue is that I am not able to load my entire data into memory. Thus, the following doesn

5条回答
  •  庸人自扰
    2020-12-08 11:34

    Blaze might be the tool for you with the ability to work with pandas and csv files out of core. http://blaze.readthedocs.org/en/latest/ooc.html

    import blaze
    import pandas as pd
    d = blaze.Data('my-large-file.csv')
    d.P_VALUE.sort()  # Uses Chunked Pandas
    

    For faster processing, load it into a database first which blaze can control. But if this is a one off and you have some time then the posted code should do it.

提交回复
热议问题