Why is ConcurrentBag<T> so slow in .Net (4.0)? Am I doing it wrong?

天大地大妈咪最大 提交于 2019-11-28 04:51:51

Let me ask you this: how realistic is it that you'd have an application which is constantly adding to a collection and never reading from it? What's the use of such a collection? (This is not a purely rhetorical question. I could imagine there being uses where, e.g., you only read from the collection on shutdown (for logging) or when requested by the user. I believe these scenarios are fairly rare, though.)

This is what your code is simulating. Calling List<T>.Add is going to be lightning-fast in all but the occasional case where the list has to resize its internal array; but this is smoothed out by all the other adds that happen quite quickly. So you're not likely to see a significant amount of contention in this context, especially testing on a personal PC with, e.g., even 8 cores (as you stated you have in a comment somewhere). Maybe you might see more contention on something like a 24-core machine, where many cores can be trying to add to the list literally at the same time.

Contention is much more likely to creep in where you read from your collection, esp. in foreach loops (or LINQ queries which amount to foreach loops under the hood) which require locking the entire operation so that you aren't modifying your collection while iterating over it.

If you can realistically reproduce this scenario, I believe you will see ConcurrentBag<T> scale much better than your current test is showing.


Update: Here is a program I wrote to compare these collections in the scenario I described above (multiple writers, many readers). Running 25 trials with a collection size of 10000 and 8 reader threads, I got the following results:

Took 529.0095 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 39.5237 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
Took 309.4475 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 81.1967 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
Took 228.7669 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 164.8376 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
[ ... ]
Average list time: 176.072456 ms.
Average bag time: 59.603656 ms.

So clearly it depends on exactly what you're doing with these collections.

There seems to be a bug in the .NET Framework 4 that Microsoft fixed in 4.5, it seems they didn't expect ConcurrentBag to be used a lot.

See the following Ayende post for more info

http://ayende.com/blog/156097/the-high-cost-of-concurrentbag-in-net-4-0

Looking at the program using MS's contention visualizer shows that ConcurrentBag<T> has a much higher cost associated with parallel insertion than simply locking on a List<T>. One thing I noticed is there appears to be a cost associated with spinning up the 6 threads (used on my machine) to begin the first ConcurrentBag<T> run (cold run). 5 or 6 threads are then used with the List<T> code, which is faster (warm run). Adding another ConcurrentBag<T> run after the list shows it takes less time than the first (warm run).

From what I'm seeing in the contention, a lot of time is spent in the ConcurrentBag<T> implementation allocating memory. Removing the explicit allocation of size from the List<T> code slows it down, but not enough to make a difference.

EDIT: it appears to be that the ConcurrentBag<T> internally keeps a list per Thread.CurrentThread, locks 2-4 times depending on if it is running on a new thread, and performs at least one Interlocked.Exchange. As noted in MSDN: "optimized for scenarios where the same thread will be both producing and consuming data stored in the bag." This is the most likely explanation for your performance decrease versus a raw list.

As a general answer:

  • Concurrent collections that use locking can be very fast if there is little or no contention for their data (i.e., locks). This is due to the fact that such collection classes are often built using very inexpensive locking primitives, especially when uncontented.
  • Lockless collections can be slower, because of tricks used to avoid locks and due to other bottlenecks such as false sharing, complexity required to implement their lockless nature leading to cache misses, etc...

To summarize, the decision of which way is faster is highly dependant on the data structures employed and the amount of contention for the locks among other issues (e.g., num readers vs. writers in a shared/exclusive type arrangement).

Your particular example has a very high degree of contention, so I must say I am surprised by the behavior. On the other hand, the amount of work done while the lock is kept is very small, so maybe there is little contention for the lock itself, after all. There could also be deficiencies in the implementation of ConcurrentBag's concurrency handling which makes your particular example (with frequent inserts and no reads) a bad use case for it.

This is already resolved in .NET 4.5. The underlying issue was that ThreadLocal, which ConcurrentBag uses, didn’t expect to have a lot of instances. That has been fixed, and now can run fairly fast.

source - The HIGH cost of ConcurrentBag in .NET 4.0

As @Darin-Dimitrov said, I suspect that your Parallel.For isn't actually spawning the same number of threads in each of the two results. Try manually creating N threads to ensure that you are actually seeing thread contention in both cases.

You basically have very few concurrent writes and no contention (Parallel.For doesn't necessarily mean many threads). Try parallelizing the writes and you will observe different results:

class Program
{
    private static object list1_lock = new object();
    private const int collSize = 1000;

    static void Main()
    {
        ConcurrentBagTest();
        LockCollTest();
    }

    private static void ConcurrentBagTest()
    {
        var bag1 = new ConcurrentBag<int>();
        var stopWatch = Stopwatch.StartNew();
        Task.WaitAll(Enumerable.Range(1, collSize).Select(x => Task.Factory.StartNew(() =>
        {
            Thread.Sleep(5);
            bag1.Add(x);
        })).ToArray());
        stopWatch.Stop();
        Console.WriteLine("Elapsed Time = {0}", stopWatch.Elapsed.TotalSeconds);
    }

    private static void LockCollTest()
    {
        var lst1 = new List<int>(collSize);
        var stopWatch = Stopwatch.StartNew();
        Task.WaitAll(Enumerable.Range(1, collSize).Select(x => Task.Factory.StartNew(() =>
        {
            lock (list1_lock)
            {
                Thread.Sleep(5);
                lst1.Add(x);
            }
        })).ToArray());
        stopWatch.Stop();
        Console.WriteLine("Elapsed = {0}", stopWatch.Elapsed.TotalSeconds);
    }
}

My guess is that locks don't experience much contention. I would recommend reading following article: Java theory and practice: Anatomy of a flawed microbenchmark. The article discusses a lock microbenchmark. As stated in the article there are a lot of things to take into consideration in this kind of situations.

It would be interesting to see scaling between the two of them.

Two questions

1) how fast is bag vs list for reading, remember to put a lock on the list

2) how fast is bag vs list for reading while another thread is writing

Because the loop body is small, you could try using the Partitioner class Create method...

which enables you to provide a sequential loop for the delegate body, so that the delegate is invoked only once per partition, instead of once per iteration

How to: Speed Up Small Loop Bodies

It appears that ConcurrentBag is just slower than the other concurrent collections.

I think it's an implementation problem- ANTS Profiler shows that it is gets bogged down in a couple of places - including an array copy.

Using concurrent dictionary is thousands of times faster.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!