Memory alignment for fast FFT in Python using shared arrays

China☆狼群 提交于 2019-12-06 03:25:05

问题


I write an image processing app that needs to do multiple things and it has to do them as much real-time as possible. Acquisition of the data and their processing runs in separate processes (mainly for performance reasons). The data itself is quite large (2MPix 16-bit grayscale images).

I can share arrays between processes as it is described in this post: How do I pass large numpy arrays between python subprocesses without saving to disk? (I use the shmarray script from the numpy-shared package). I can perform the supplied Numpy FFT on those data without problem, but it is quite slow.

Calling FFTW would probably be much faster, but in order to fully benefit from it, I am supposed to run my operations on arrays that are memory aligned.

The question: Is there a way how to create and share Numpy-like arrays between processes, that are, at the same time, guaranteed to be memory aligned?


回答1:


The simplest standard trick to get correctly aligned memory is to allocate a bit more than needed and skip the first few bytes if the alignment is wrong. If I remember correctly, NumPy arrays will always be 8-byte aligned, and FFTW requires 16-byte aligment to perform best. So you would simply allocate 8 bytes more than needed, and skip the first 8 bytes if necessary.

Edit: This is rather easy to implement. The pointer to the data is available as an integer in the ctypes.data attribute of a NumPy array. Using the shifted block can be achieved by slicing, viewing as a different data type and reshaping -- all these won't copy the data, but rather reuse the same buf.

To allocate an 16-byte aligned 1000x1000 array of 64-bit floating point numbers, we could use this code:

m = n = 1000
dtype = numpy.dtype(numpy.float64)
nbytes = m * n * dtype.itemsize
buf = numpy.empty(nbytes + 16, dtype=numpy.uint8)
start_index = -buf.ctypes.data % 16
a = buf[start_index:start_index + nbytes].view(dtype).reshape(m, n)

Now, a is an array with the desired properties, as can be verified by checking that a.ctypes.data % 16 is indeed 0.




回答2:


Generalizing on Sven's answer, this function will return an aligned copy (if needed) of any numpy array:

import numpy as np
def aligned(a, alignment=16):
    if (a.ctypes.data % alignment) == 0:
        return a

    extra = alignment / a.itemsize
    buf = np.empty(a.size + extra, dtype=a.dtype)
    ofs = (-buf.ctypes.data % alignment) / a.itemsize
    aa = buf[ofs:ofs+a.size].reshape(a.shape)
    np.copyto(aa, a)
    assert (aa.ctypes.data % alignment) == 0
    return aa


来源:https://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!