I want to read a huge text file that contains list of lists of integers. Now I\'m doing the following:
G = []
with open(\"test.txt\", \'r\') as f:
for li
pandas which is based on numpy has a C based file parser which is very fast:
# generate some integer data (5 M rows, two cols) and write it to file
In [24]: data = np.random.randint(1000, size=(5 * 10**6, 2))
In [25]: np.savetxt('testfile.txt', data, delimiter=' ', fmt='%d')
# your way
In [26]: def your_way(filename):
...: G = []
...: with open(filename, 'r') as f:
...: for line in f:
...: G.append(list(map(int, line.split(','))))
...: return G
...:
In [26]: %timeit your_way('testfile.txt', ' ')
1 loops, best of 3: 16.2 s per loop
In [27]: %timeit pd.read_csv('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 1.57 s per loop
So pandas.read_csv takes about one and a half second to read your data and is about 10 times faster than your method.