Parallel.For performance

别说谁变了你拦得住时间么 提交于 2020-01-04 13:41:32

问题


This code is from Microsoft article http://msdn.microsoft.com/en-us/library/dd460703.aspx, with small changes:

        const int size = 10000000;
        int[] nums = new int[size];
        Parallel.For(0, size, i => {nums[i] = 1;});
        long total = 0;

        Parallel.For<long>(
            0, size, () => 0,
            (j, loop, subtotal) =>
            {
                return subtotal + nums[j];
            },
            (x) => Interlocked.Add(ref total, x) 
        );

        if (total != size)
        {
            Console.WriteLine("Error");
        }

Non-parallel loop version is:

        for (int i = 0; i < size; ++i)
        {
            total += nums[i];
        }

When I measure loop execution time using StopWatch class, I see that parallel version is slower by 10-20%. Testing is done on Windows 7 64 bit, Intel i5-2400 CPU, 4 cores, 4 GB RAM. Of course, in Release configuration.

In my real program I am trying to compute an image histogram, and parallel version runs 10 times slower. Can such kind of computation tasks, when every loop invocation is very fast, be successfully parallelized with TPL?

Edit.

Finally I managed to shave more that 50% of histogram calculation execution time with Parallel.For, when divided the whole image to some number of chunks. Every loop body invocation now handles the whole chunk, and not one pixel.


回答1:


Because Parallel.For should be used for things that are a little heacy, not to sum simple numbers! Just the use of the delegate (j, loop, subtotal) => is probably more than enough to give 10-20% more time. And we aren't even speaking of the threading overhead. It would be interesting to see some benchmark against a delegate summer in the for cycle and to see not only the "real world" time, but the CPU time.

I have even added a comparison to a "simple" delegate that does the same thing as the Parallel.For<> delegate.

Mmmh... Now I have some numbers at 32 bits, on my PC (an AMD six core)

32 bits
Parallel: Ticks:      74581, Total ProcessTime:    2496016
Base    : Ticks:      90395, Total ProcessTime:     312002
Func    : Ticks:     147037, Total ProcessTime:     468003

The Parallel is a little faster at wall time, but 8x slower at processor time :-)

But at 64 bits:

64 bits
Parallel: Ticks:     104326, Total ProcessTime:    2652017
Base    : Ticks:      51664, Total ProcessTime:     156001
Func    : Ticks:      77861, Total ProcessTime:     312002

Modified code:

Console.WriteLine("{0} bits", IntPtr.Size == 4 ? 32 : 64);

var cp = Process.GetCurrentProcess();
cp.PriorityClass = ProcessPriorityClass.High;

const int size = 10000000;
int[] nums = new int[size];
Parallel.For(0, size, i => { nums[i] = 1; });

GC.Collect();
GC.WaitForPendingFinalizers();

long total = 0;

{
    TimeSpan start = cp.TotalProcessorTime;
    Stopwatch sw = Stopwatch.StartNew();

    Parallel.For<long>(
        0, size, () => 0,
        (j, loop, subtotal) =>
        {
            return subtotal + nums[j];
        },
        (x) => Interlocked.Add(ref total, x)
    );

    sw.Stop();
    TimeSpan end = cp.TotalProcessorTime;

    Console.WriteLine("Parallel: Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}

if (total != size)
{
    Console.WriteLine("Error");
}

GC.Collect();
GC.WaitForPendingFinalizers();

total = 0;

{
    TimeSpan start = cp.TotalProcessorTime;
    Stopwatch sw = Stopwatch.StartNew();

    for (int i = 0; i < size; ++i)
    {
        total += nums[i];
    }

    sw.Stop();
    TimeSpan end = cp.TotalProcessorTime;

    Console.WriteLine("Base    : Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}

if (total != size)
{
    Console.WriteLine("Error");
}

GC.Collect();
GC.WaitForPendingFinalizers();

total = 0;

Func<int, int, long, long> adder = (j, loop, subtotal) =>
{
    return subtotal + nums[j];
};

{
    TimeSpan start = cp.TotalProcessorTime;
    Stopwatch sw = Stopwatch.StartNew();

    for (int i = 0; i < size; ++i)
    {
        total = adder(i, 0, total);
    }

    sw.Stop();
    TimeSpan end = cp.TotalProcessorTime;

    Console.WriteLine("Func    : Ticks: {0,10}, Total ProcessTime: {1,10}", sw.ElapsedTicks, (end - start).Ticks);
}

if (total != size)
{
    Console.WriteLine("Error");
}


来源:https://stackoverflow.com/questions/18203388/parallel-for-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!