In high-performance computing, sums, products, etc are often calculated using a \"parallel reduction\" that takes n elements and completes in O(log n) time
This seems like a good start:
parFold :: (a -> a -> a) -> [a] -> a
parFold f = go
where
strategy = parList rseq
go [x] = x
go xs = go (reduce xs `using` strategy)
reduce (x:y:xs) = f x y : reduce xs
reduce list = list -- empty or singleton list
It works, but parallelism is not so great. Replacing parList with something like parListChunks 1000 helps a bit, but speedup is still limited to under 1.5x on an 8-core machine.