Comparatively slow python numpy 3D Fourier Transformation

匿名 (未验证) 提交于 2019-12-03 02:30:02

问题:

For my work I need to perform discrete fourier transformations (DFTs) on large images. In the current example I require a 3D FT for a 1921 x 512 x 512 image (along with 2D FFTs of 512 x 512 images). Right now, I am using the numpy package and the associated function np.fft.fftn(). The code snippet below exemplarily shows 2D and 3D FFT times on an equal-sized/slightly smaller 2D/3D random-number-generated grid in the following way:

import sys import numpy as np import time  tas = time.time() a = np.random.rand(512, 512) tab = time.time() b = np.random.rand(100, 512, 512)  tbfa = time.time()  fa = np.fft.fft2(a) tfafb = time.time() fb = np.fft.fftn(b) tfbe = time.time()  print "initializing 512 x 512 grid:", tab - tas print "initializing 100 x 512 x 512 grid:", tbfa - tab print "2D FFT on 512 x 512 grid:", tfafb - tbfa print "3D FFT on 100 x 512 x 512 grid:", tfbe - tfafb 

Output:

initializing 512 x 512 grid: 0.00305700302124 initializing 100 x 512 x 512 grid: 0.301637887955 2D FFT on 512 x 512 grid: 0.0122730731964 3D FFT on 100 x 512 x 512 grid: 3.88418793678 

The problem that I have is that I will need this process quite often, so the time spent per image should be short. When testing on my own computer (middle-segment laptop, 2GB RAM allocated to virtual machine (--> therefore smaller test grid)), as you can see the 3D FFT takes ~ 5 s (order-of-magnitude). Now, at work, the machines are way better, cluster/grid-architecture systems and FFTs are much faster. In both cases the 2D ones finish quasi instantaneously.

However with 1921x512x512, np.fft.fftn() takes ~ 5 min. Since I guess scipy's implementation is not much faster and considering that on MATLAB FFTs of same-sized grids finish within ~ 5 s, my question is whether there is a method to speed the process up to or almost to MATLAB times. My knowledge about FFTs is limited, but apparently MATLAB uses the FFTW algorithm, which python does not. Any reasonable chance that with some pyFFTW package I get similar times? Also, 1921 seems an unlucky choice, having only 2 prime factors (17, 113), so I assume this also plays a role. On the other hand 512 is a well-suited power of two. Are MATLAB-like times achievable if possible also without padding up with zeros to 2048?

I'm asking because I'll have to use FFTs a lot (to an amount where such differences will be of huge influence!) and in case there is no possibility to reduce computation times in python, I'd have to switch to other, faster implementations.

回答1:

Yes, there is a chance that using FFTW through the interface pyfftw will reduce your computation time compared to numpy.fft or scipy.fftpack. The performances of these implementations of DFT algorithms can be compared in benchmarks such as this one : some interesting results are reported in Improving FFT performance in Python

I suggest the following code for a test:

import pyfftw import numpy import time import scipy  f = pyfftw.n_byte_align_empty((127,512,512),16, dtype='complex128') #f = pyfftw.empty_aligned((33,128,128), dtype='complex128', n=16) f[:] = numpy.random.randn(*f.shape)  # first call requires more time for plan creation # by default, pyfftw use FFTW_MEASURE for the plan creation, which means that many 3D dft are computed so as to choose the fastest algorithm. fftf=pyfftw.interfaces.numpy_fft.fftn(f)  #help(pyfftw.interfaces) tas = time.time() fftf=pyfftw.interfaces.numpy_fft.fftn(f) # here the plan is applied, nothing else. tas = time.time()-tas print "3D FFT, pyfftw:", tas  f = pyfftw.n_byte_align_empty((127,512,512),16, dtype='complex128') #f = pyfftw.empty_aligned((33,128,128), dtype='complex128', n=16) f[:] = numpy.random.randn(*f.shape)   tas = time.time() fftf=numpy.fft.fftn(f) tas = time.time()-tas print "3D FFT, numpy:", tas  tas = time.time() fftf=scipy.fftpack.fftn(f) tas = time.time()-tas print "3D FFT, scipy/fftpack:", tas  # first call requires more time for plan creation # by default, pyfftw use FFTW_MEASURE for the plan creation, which means that many 3D dft are computed so as to choose the fastest algorithm. f = pyfftw.n_byte_align_empty((128,512,512),16, dtype='complex128') fftf=pyfftw.interfaces.numpy_fft.fftn(f)  tas = time.time() fftf=pyfftw.interfaces.numpy_fft.fftn(f) # here the plan is applied, nothing else. tas = time.time()-tas print "3D padded FFT, pyfftw:", tas 

For a size of 127*512*512, on my modest computer, I got:

3D FFT, pyfftw: 3.94130897522 3D FFT, numpy: 16.0487070084 3D FFT, scipy/fftpack: 19.001199007 3D padded FFT, pyfftw: 2.55221295357 

So pyfftw is significantly faster than numpy.fft and scipy.fftpack. Using padding is even faster, but the thing that is computed is different.

Lastly, pyfftw may seem slower at the first run due to the fact that it uses the flag FFTW_MEASURE according to the documentation. It's a good thing if and only if many DFTs of the same size are successively computed.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!