I generate a list of one dimensional numpy arrays in a loop and later convert this list to a 2d numpy array. I would\'ve preallocated a 2d numpy array if i knew the number o
Suppose you know that the final array arr
will never be larger than 5000x10.
Then you could pre-allocate an array of maximum size, populate it with data as
you go through the loop, and then use arr.resize
to cut it down to the
discovered size after exiting the loop.
The tests below suggest doing so will be slightly faster than constructing intermediate python lists no matter what the ultimate size of the array is.
Also, arr.resize
de-allocates the unused memory, so the final (though maybe not the intermediate) memory footprint is smaller than what is used by python_lists_to_array
.
This shows numpy_all_the_way
is faster:
% python -mtimeit -s"import test" "test.numpy_all_the_way(100)"
100 loops, best of 3: 1.78 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(1000)"
100 loops, best of 3: 18.1 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(5000)"
10 loops, best of 3: 90.4 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(100)"
1000 loops, best of 3: 1.97 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(1000)"
10 loops, best of 3: 20.3 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(5000)"
10 loops, best of 3: 101 msec per loop
This shows numpy_all_the_way
uses less memory:
% test.py
Initial memory usage: 19788
After python_lists_to_array: 20976
After numpy_all_the_way: 20348
test.py:
import numpy as np
import os
def memory_usage():
pid = os.getpid()
return next(line for line in open('/proc/%s/status' % pid).read().splitlines()
if line.startswith('VmSize')).split()[-2]
N, M = 5000, 10
def python_lists_to_array(k):
list_of_arrays = list(map(lambda x: x * np.ones(M), range(k)))
arr = np.array(list_of_arrays)
return arr
def numpy_all_the_way(k):
arr = np.empty((N, M))
for x in range(k):
arr[x] = x * np.ones(M)
arr.resize((k, M))
return arr
if __name__ == '__main__':
print('Initial memory usage: %s' % memory_usage())
arr = python_lists_to_array(5000)
print('After python_lists_to_array: %s' % memory_usage())
arr = numpy_all_the_way(5000)
print('After numpy_all_the_way: %s' % memory_usage())