C# Parallel Vs. Threaded code performance

余生颓废 提交于 2021-02-06 16:10:33

问题


I've been testing the performance of System.Threading.Parallel vs a Threading and I'm surprised to see Parallel taking longer to finish tasks than threading. I'm sure it's due to my limited knowledge of Parallel, which I just started reading up on.

I thought i'll share few snippets and if anyone can point out to me paralle code is running slower vs threaded code. Also tried to run the same comparison for finding prime numbers and found parallel code finishing much later than threaded code.

public class ThreadFactory
{
    int workersCount;
    private List<Thread> threads = new List<Thread>();

    public ThreadFactory(int threadCount, int workCount, Action<int, int, string> action)
    {
        workersCount = threadCount;

        int totalWorkLoad = workCount;
        int workLoad = totalWorkLoad / workersCount;
        int extraLoad = totalWorkLoad % workersCount;

        for (int i = 0; i < workersCount; i++)
        {
            int min, max;
            if (i < (workersCount - 1))
            {
                min = (i * workLoad);
                max = ((i * workLoad) + workLoad - 1);
            }
            else
            {
                min = (i * workLoad);
                max = (i * workLoad) + (workLoad - 1 + extraLoad);
            }
            string name = "Working Thread#" + i; 

            Thread worker = new Thread(() => { action(min, max, name); });
            worker.Name = name;
            threads.Add(worker);
        }
    }

    public void StartWorking()
    {
        foreach (Thread thread in threads)
        {
            thread.Start();
        }

        foreach (Thread thread in threads)
        {
            thread.Join();
        }
    }
}

Here is the program:

Stopwatch watch = new Stopwatch();
watch.Start();
int path = 1;

List<int> numbers = new List<int>(Enumerable.Range(0, 10000));

if (path == 1)
{
    Parallel.ForEach(numbers, x =>
    {
        Console.WriteLine(x);
        Thread.Sleep(1);

    });
}
else
{
    ThreadFactory workers = new ThreadFactory(10, numbers.Count, (min, max, text) => {

        for (int i = min; i <= max; i++)
        {
            Console.WriteLine(numbers[i]);
            Thread.Sleep(1);
        }
    });

    workers.StartWorking();
}

watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds.ToString());

Console.ReadLine();

Update:

Taking Locking into consideration: I tried the following snippet. Again the same results, Parallel seems to finish much slower.

path = 1; cieling = 10000000;

    List<int> numbers = new List<int>();

    if (path == 1)
    {
        Parallel.For(0, cieling, x =>
        {
            lock (numbers)
            {
                numbers.Add(x);    
            }

        });
    }

    else
    {
        ThreadFactory workers = new ThreadFactory(10, cieling, (min, max, text) =>
        {

            for (int i = min; i <= max; i++)
            {
                lock (numbers)
                {
                    numbers.Add(i);    
                }                       

            }
        });

        workers.StartWorking();
    }

Update 2: Just a quick update that my machine has Quad Core Processor. So Parallel have 4 cores available.


回答1:


Refering to a blog post by Reed Copsey Jr:

Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable, the number of items required for processing is not known in advance, and must be discovered at runtime. In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it. Since IEnumerable is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out.

The locking and copying could make Parallel.ForEach take longer. Also partitioning and the scheduler of ForEach could impact and give overhead. I tested your code and increased the sleep of each task, and then the results are closer, but still ForEach is slower.

[Edit - more research]

I added the following to the execution loops:

if (Thread.CurrentThread.ManagedThreadId > maxThreadId)
   maxThreadId = Thread.CurrentThread.ManagedThreadId;

What this shows on my machine is that it uses 10 threads less with ForEach, compared to the other one with the current settings. If you want more threads out of ForEach, you would have to fiddle around with ParallelOptions and the Scheduler.

See Does Parallel.ForEach limits the number of active threads?




回答2:


I think I can answer your question. First of all, you didn't write how many cores your system has. if you are running a dual-core, only 4 thread will work using the Parallel.For while you are working with 10 threads in your Thread example. More threads will work better as the task you are running (Printing + Short sleep) is a very short task for threading and the thread overhead is very large compared to the task, I'm almost sure that if you write the same code without threads it will work faster.

Both your methods works pretty much the same but if you create all the threads in advance you save a lot as the Parallel.For uses the Task pool which adds some move overhead.




回答3:


The comparison is not very fair in regard to Threading.Parallel. You tell your custom thread pool that it'll need 10 threads. Threading.Parallel does not know how much threads it will need so it tries to adapt at run-time taking into account such things as current CPU load and other things. Since the number of iterations in the test is small enough you can this number of threads adaption penalty. Providing the same hint for Threading.Parallel will make it run much faster:


int workerThreads;
int completionPortThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
ThreadPool.SetMinThreads(10, completionPortThreads);




回答4:


It's logical :-)

That would be the first time in history that addition of one (or two) layers of code improved performance. When you use convenience libraries you should expect to pay the price. BTW you haven't posted the numbers. Got to publish results :-)

To make things a bit more failr (or biased :-) for the Parallel-s, convert the list into array.

Then to make them totally unfair, split the work on your own, make an array of just 10 items and totally spoon feed actions to Parallel. You are of course doing the job that Parallel-s promised to do for you at this point but it's bound to be an interesting number :-)

BTW I just read that Reed's blog. The partitioning used in this question is what he calls the most simple and naive partitioning. Which makes it a very good elimination test indeed. You still need to check the zero work case just to know if it's totally hosed.



来源:https://stackoverflow.com/questions/3572554/c-sharp-parallel-vs-threaded-code-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!