I am working on a program which manipulates images of different sizes. Many of these manipulations read pixel data from an input and write to a separate output (e.g. blur).
To optimize simple image transformations, you are far better off using SIMD vector math than trying to multi-thread your program.