for (int x = 0; x < blockCountX; x++)
{
for (int y = 0; y < blockCountY; y++)
Why do you believe that breaking up the large image into smaller chunks will be more efficient? Is the large image too large to fit into system memory? 4million pixels x 8bpp (1 byte per pixel) = 4 megabytes. This was a lot of memory 20 years ago. Today it's chump change.
Creating multiple 256x256 sub-images will require copying the pixel data into new images in memory, plus the image header/descriptor overhead for each new image, plus alignment padding per scanline. You will more than double your memory use, which can create performance problems (virtual swapping) itself.
You are also spinning up a new thread for each image block. Allocating a thread is very expensive, and may take more time than the work you want the thread to do. Consider at least using ThreadPool.QueueUserWorkItem
to make use of already available system worker threads. Using .NET 4.0's Task
class would be even better, IMO.
Forget .GetPixel(). It's a thousand times slower than pixel memory access.
If you want to distribute processing the image pixels across multiple CPU cores, consider processing each scanline or group of scanlines to a different task or worker thread.
May I recommend a few things?
Forget about image.GetPixel(), which is horribly slow; work directly with the bitmap data, and the performance of your algorithm will improve by so much, that you will not need to run parallel threads to improve its efficiency. See MSDN: http://msdn.microsoft.com/en-us/library/system.drawing.imaging.bitmapdata.aspx
If you insist on parallel threads, make use of the threadpool, instead of spawning 64 threads. (See MSDN: http://msdn.microsoft.com/en-us/library/3dasc8as(v=vs.80).aspx)
If you insist on spawning many threads, do NOT spawn more threads than the cores of your CPU. I do not suppose you have 64 cores on your machine, do you?
If you insist on spawning many threads, you will, of course, need to pass the location of each tile to the thread, so that you know exactly where that tile should be placed when you reconstruct the big picture. That's not less than optimal, it is necessary.
image.Scan0
if you don't have any restriction about unsafe operations.Parallel.ForEach
for such usages. If you can't use it, you can use thread pools. I guess your computer does not have (2048 x 2048) / (256 x 256) = 64 core CPU.