Memory growth with broadcast operations in NumPy

后端 未结 2 976
独厮守ぢ
独厮守ぢ 2020-12-29 11:47

I am using NumPy to handle some large data matrices (of around ~50GB in size). The machine where I am running this code has 128GB of RAM so doing simple linear operations of

2条回答
  •  臣服心动
    2020-12-29 12:36

    Well, your array a takes already 1192953*192*32* 8 bytes/1.e9 = 58 GB of memory.

    The broadcasting does not make additional memory allocations for the initial arrays, but the result of

    b[:, :, np.newaxis] - c[np.newaxis, :, :]
    

    is still saved in a temporary array. Therefore at this line, you have allocated at least 2 arrays with the shape of a for a total memory used >116 GB.

    You can avoid this issue, by operating on a smaller subset of your array at one time,

    CHUNK_SIZE = 100000
    for idx in range(b.shape[0]/CHUNK_SIZE):
        sl = slice(idx*CHUNK_SIZE, (idx+1)*CHUNK_SIZE)
        a[sl] = b[sl, :, np.newaxis] - c[np.newaxis, :, :]
    

    this will be marginally slower, but uses much less memory.

提交回复
热议问题