I have a haskell code to resolve a Fast Fourier Transformation, and i want to apply data parallelism on it. However, every strategy that i use generate too many sparks and most
First off: there's a lot of optimisation to be done here before I'd start to think about parallelism:
Lists rock, but their non-consecutive memory model means they just can't allow for traversals nearly as fast as what you can achieve with tight arrays such as Data.Vector, because you inevitably end up with lots of cache misses. Indeed I've seldom seen a list-based algorithm to gain much from parallelisation, because they're bound by memory- rather than CPU performance.
Your twiddle factors are computed over and over again, you can obviously gain a lot through memoisation here.
You keep on calling length, but that's an extremely wasteful function (O (n) for something that could be O (1)). Use some container that probably handles length; lists aren't meant to (we like to keep their ability to be infinite).
The parallelisation itself will be pretty simple; I'd check on the length as suggested by John L, indeed I'd rather require a pretty large size before sparking a thread, at least something like 256: as the performance probably becomes crucial only at sizes of several thousands, this should sill be enough threads to keep your cores busy.
import Data.Vector.Unboxed as UBV
import Control.Parallel.Strategies
type ℂ = Complex Float
fft' :: UBV.Vector ℂ -> UBV.Vector ℂ
fft' aₓs = interleave lᵥs rᵥs
where (lᵥs, rᵥs) = (fft lₓs, fft rₓs)
`using` if UBV.length aₓs > 256 then parTuple2 else r0
(lₓs, rₓs) = byflyS aₓs