I\'ve been doing some investigation to see how we can create a multithreaded application that runs through a tree.
To find how this can be implemented in the best wa
The Parallel.For and .ForEach methods are implemented internally as equivalent to running iterations in Tasks, e.g. that a loop like:
Parallel.For(0, N, i =>
{
DoWork(i);
});
is equivalent to:
var tasks = new List(N);
for(int i=0; i DoWork((int)state), i));
}
Task.WaitAll(tasks.ToArray());
And from the perspective of every iteration potentially running in parallel with every other iteration, this is an ok mental model, but does not happen in reality. Parallel, in fact, does not necessarily use one Task per iteration, as that is significantly more overhead than is necessary. Parallel.ForEach tries to use the minimum number of tasks necessary to complete the loop as fast as possible. It spins up tasks as threads become available to process those tasks, and each of those tasks participates in a management scheme (I think its called chunking): A task asks for multiple iterations to be done, gets them, and then processes that work, and then goes back for more. The chunk sizes vary based the number of tasks participating, the load on the machine, etc.
PLINQ’s .AsParallel() has a different implementation, but it ‘can’ still similarly fetch multiple iterations into a temporary store, do the calculations in a thread (but not as a task), and put the query results into a small buffer. (You get something based on ParallelQuery, and then further .Whatever() functions bind to an alternative set of extension methods that provide parallel implementations).
So now that we have a small idea of how these two mechanisms work, I will try to provide an answer to your original question:
So why is .AsParallel() slower than Parallel.ForEach? The reason stems from the following. Tasks (or their equivalent implementation here) do NOT block on I/O-like calls. They ‘await’ and free up the CPU to do something else. But (quoting C# nutshell book): “PLINQ cannot perform I/O-bound work without blocking threads”. The calls are synchronous. They were written with the intention that you increase the degree of parallelism if (and ONLY if) you are doing such things as downloading web pages per task that do not hog CPU time.
And the reason why your function calls are exactly analogous to I/O bound calls is this: One of your threads (call it T) blocks and does nothing until all of its child threads have finished, which can be a slow process here. T itself is not CPU-intensive while it waits for the children to unblock, it is doing nothing but waiting. Hence it is identical to a typical I/O bound function call.