Numpy loading csv TOO slow compared to Matlab

前端 未结 5 1634
無奈伤痛
無奈伤痛 2020-12-01 05:08

I posted this question because I was wondering whether I did something terribly wrong to get this result.

I have a medium-size csv file and I tried to use numpy to l

5条回答
  •  盖世英雄少女心
    2020-12-01 05:54

    Yeah, reading csv files into numpy is pretty slow. There's a lot of pure Python along the code path. These days, even when I'm using pure numpy I still use pandas for IO:

    >>> import numpy as np, pandas as pd
    >>> %time d = np.genfromtxt("./test.csv", delimiter=",")
    CPU times: user 14.5 s, sys: 396 ms, total: 14.9 s
    Wall time: 14.9 s
    >>> %time d = np.loadtxt("./test.csv", delimiter=",")
    CPU times: user 25.7 s, sys: 28 ms, total: 25.8 s
    Wall time: 25.8 s
    >>> %time d = pd.read_csv("./test.csv", delimiter=",").values
    CPU times: user 740 ms, sys: 36 ms, total: 776 ms
    Wall time: 780 ms
    

    Alternatively, in a simple enough case like this one, you could use something like what Joe Kington wrote here:

    >>> %time data = iter_loadtxt("test.csv")
    CPU times: user 2.84 s, sys: 24 ms, total: 2.86 s
    Wall time: 2.86 s
    

    There's also Warren Weckesser's textreader library, in case pandas is too heavy a dependency:

    >>> import textreader
    >>> %time d = textreader.readrows("test.csv", float, ",")
    readrows: numrows = 1500000
    CPU times: user 1.3 s, sys: 40 ms, total: 1.34 s
    Wall time: 1.34 s
    

提交回复
热议问题