问题
This code is from Microsoft article http://msdn.microsoft.com/en-us/library/dd460703.aspx, with small changes:
const int size = 10000000;
int[] nums = new int[size];
Parallel.For(0, size, i => {nums[i] = 1;});
long total = 0;
Parallel.For<long>(
0, size, () => 0,
(j, loop, subtotal) =>
{
return subtotal + nums[j];
},
(x) => Interlocked.Add(ref total, x)
);
if (total != size)
{
Console.WriteLine("Error");
}
Non-parallel loop version is:
for (int i = 0; i < size; ++i)
{
total += nums[i];
}
When I measure loop execution time using StopWatch
class, I see that parallel version is slower by 10-20%. Testing is done on Windows 7 64 bit, Intel i5-2400 CPU, 4 cores, 4 GB RAM. Of course, in Release configuration.
In my real program I am trying to compute an image histogram, and parallel version runs 10 times slower. Can such kind of computation tasks, when every loop invocation is very fast, be successfully parallelized with TPL?
Edit.
Finally I managed to shave more that 50% of histogram calculation execution time with Parallel.For, when divided the whole image to some number of chunks. Every loop body invocation now handles the whole chunk, and not one pixel.
回答1:
Because Parallel.For
should be used for things that are a little heacy, not to sum simple numbers! Just the use of the delegate (j, loop, subtotal) =>
is probably more than enough to give 10-20% more time. And we aren't even speaking of the threading overhead. It would be interesting to see some benchmark against a delegate summer in the for cycle and to see not only the "real world" time, but the CPU time.
I have even added a comparison to a "simple" delegate that does the same thing as the Parallel.For<>
delegate.
Mmmh... Now I have some numbers at 32 bits, on my PC (an AMD six core)
32 bits
Parallel: Ticks: 74581, Total ProcessTime: 2496016
Base : Ticks: 90395, Total ProcessTime: 312002
Func : Ticks: 147037, Total ProcessTime: 468003
The Parallel is a little faster at wall time, but 8x slower at processor time :-)
But at 64 bits:
64 bits
Parallel: Ticks: 104326, Total ProcessTime: 2652017
Base : Ticks: 51664, Total ProcessTime: 156001
Func : Ticks: 77861, Total ProcessTime: 312002
Modified code:
Console.WriteLine("{0} bits", IntPtr.Size == 4 ? 32 : 64);
var cp = Process.GetCurrentProcess();
cp.PriorityClass = ProcessPriorityClass.High;
const int size = 10000000;
int[] nums = new int[size];
Parallel.For(0, size, i => { nums[i] = 1; });
GC.Collect();
GC.WaitForPendingFinalizers();
long total = 0;
{
TimeSpan start = cp.TotalProcessorTime;
Stopwatch sw = Stopwatch.StartNew();
Parallel.For<long>(
0, size, () => 0,
(j, loop, subtotal) =>
{
return subtotal + nums[j];
},
(x) => Interlocked.Add(ref total, x)
);
sw.Stop();
TimeSpan end = cp.TotalProcessorTime;
Console.WriteLine("Parallel: Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}
if (total != size)
{
Console.WriteLine("Error");
}
GC.Collect();
GC.WaitForPendingFinalizers();
total = 0;
{
TimeSpan start = cp.TotalProcessorTime;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < size; ++i)
{
total += nums[i];
}
sw.Stop();
TimeSpan end = cp.TotalProcessorTime;
Console.WriteLine("Base : Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}
if (total != size)
{
Console.WriteLine("Error");
}
GC.Collect();
GC.WaitForPendingFinalizers();
total = 0;
Func<int, int, long, long> adder = (j, loop, subtotal) =>
{
return subtotal + nums[j];
};
{
TimeSpan start = cp.TotalProcessorTime;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < size; ++i)
{
total = adder(i, 0, total);
}
sw.Stop();
TimeSpan end = cp.TotalProcessorTime;
Console.WriteLine("Func : Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}
if (total != size)
{
Console.WriteLine("Error");
}
来源:https://stackoverflow.com/questions/18203388/parallel-for-performance