I am working on a program which manipulates images of different sizes. Many of these manipulations read pixel data from an input and write to a separate output (e.g. blur).
As a bit of a left-field idea...
What systems are you running this on? Have you thought of using the GPU in your PCs?
Nvidia have the CUDA APIs for this sort of thing