Very large matrices using Python and NumPy

后端 未结 11 2039
难免孤独
难免孤独 2020-11-22 13:51

NumPy is an extremely useful library, and from using it I\'ve found that it\'s capable of handling matrices which are quite large (10000 x 10000) easily, but begins to strug

11条回答
  •  暖寄归人
    2020-11-22 14:28

    PyTables and NumPy are the way to go.

    PyTables will store the data on disk in HDF format, with optional compression. My datasets often get 10x compression, which is handy when dealing with tens or hundreds of millions of rows. It's also very fast; my 5 year old laptop can crunch through data doing SQL-like GROUP BY aggregation at 1,000,000 rows/second. Not bad for a Python-based solution!

    Accessing the data as a NumPy recarray again is as simple as:

    data = table[row_from:row_to]
    

    The HDF library takes care of reading in the relevant chunks of data and converting to NumPy.

提交回复
热议问题