Optimal way to append to numpy array

前端 未结 2 1758
我在风中等你
我在风中等你 2020-12-31 19:11

I have a numpy array and I can simply append an item to it using append, like this:

numpy.append(myarray, 1)

In this case I just appended t

相关标签:
2条回答
  • 2020-12-31 19:15

    Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies.

    If you know the length in advance, it is best to pre-allocate the array using a function like np.ones, np.zeros, or np.empty.

    desired_length = 500
    results = np.empty(desired_length)
    for i in range(desired_length):
        results[i] = i**2
    

    If you don't know the length, it's probably more efficient to keep your results in a regular list and convert it to an array afterwards.

    results = []
    while condition:
        a = do_stuff()
        results.append(a)
    results = np.array(results)
    

    Here are some timings on my computer.

    def pre_allocate():
        results = np.empty(5000)
        for i in range(5000):
            results[i] = i**2
        return results
    
    def list_append():
        results = []
        for i in range(5000):
            results.append(i**2)
        return np.array(results)
    
    def numpy_append():
        results = np.array([])
        for i in range(5000):
            np.append(results, i**2)
        return results
    
    %timeit pre_allocate()
    # 100 loops, best of 3: 2.42 ms per loop
    
    %timeit list_append()
    # 100 loops, best of 3: 2.5 ms per loop
    
    %timeit numpy_append()
    # 10 loops, best of 3: 48.4 ms per loop
    

    So you can see that both pre-allocating and using a list then converting are much faster.

    0 讨论(0)
  • 2020-12-31 19:21

    If you know the size of the array at the end of the run, then it is going to be much faster to pre-allocate an array of the appropriate size and then set the values. If you do need to append on-the fly, it's probably better to try to not do this one element at a time, instead appending as few times as possible to avoid generating many copies over and over again. You might also want to do some profiling of the difference in timings of np.append, np.hstack, np.concatenate. etc.

    0 讨论(0)
提交回复
热议问题