Pythonic way to create a numpy array from a list of numpy arrays

后端 未结 6 1076
执笔经年
执笔经年 2020-12-13 06:24

I generate a list of one dimensional numpy arrays in a loop and later convert this list to a 2d numpy array. I would\'ve preallocated a 2d numpy array if i knew the number o

6条回答
  •  执念已碎
    2020-12-13 06:58

    Suppose you know that the final array arr will never be larger than 5000x10. Then you could pre-allocate an array of maximum size, populate it with data as you go through the loop, and then use arr.resize to cut it down to the discovered size after exiting the loop.

    The tests below suggest doing so will be slightly faster than constructing intermediate python lists no matter what the ultimate size of the array is.

    Also, arr.resize de-allocates the unused memory, so the final (though maybe not the intermediate) memory footprint is smaller than what is used by python_lists_to_array.

    This shows numpy_all_the_way is faster:

    % python -mtimeit -s"import test" "test.numpy_all_the_way(100)"
    100 loops, best of 3: 1.78 msec per loop
    % python -mtimeit -s"import test" "test.numpy_all_the_way(1000)"
    100 loops, best of 3: 18.1 msec per loop
    % python -mtimeit -s"import test" "test.numpy_all_the_way(5000)"
    10 loops, best of 3: 90.4 msec per loop
    
    % python -mtimeit -s"import test" "test.python_lists_to_array(100)"
    1000 loops, best of 3: 1.97 msec per loop
    % python -mtimeit -s"import test" "test.python_lists_to_array(1000)"
    10 loops, best of 3: 20.3 msec per loop
    % python -mtimeit -s"import test" "test.python_lists_to_array(5000)"
    10 loops, best of 3: 101 msec per loop
    

    This shows numpy_all_the_way uses less memory:

    % test.py
    Initial memory usage: 19788
    After python_lists_to_array: 20976
    After numpy_all_the_way: 20348
    

    test.py:

    import numpy as np
    import os
    
    
    def memory_usage():
        pid = os.getpid()
        return next(line for line in open('/proc/%s/status' % pid).read().splitlines()
                    if line.startswith('VmSize')).split()[-2]
    
    N, M = 5000, 10
    
    
    def python_lists_to_array(k):
        list_of_arrays = list(map(lambda x: x * np.ones(M), range(k)))
        arr = np.array(list_of_arrays)
        return arr
    
    
    def numpy_all_the_way(k):
        arr = np.empty((N, M))
        for x in range(k):
            arr[x] = x * np.ones(M)
        arr.resize((k, M))
        return arr
    
    if __name__ == '__main__':
        print('Initial memory usage: %s' % memory_usage())
        arr = python_lists_to_array(5000)
        print('After python_lists_to_array: %s' % memory_usage())
        arr = numpy_all_the_way(5000)
        print('After numpy_all_the_way: %s' % memory_usage())
    

提交回复
热议问题