Garbage Collection and Parallel.ForEach Issue After VS2015 Upgrade

柔情痞子 提交于 2019-11-27 13:27:53
Hans Passant

This indeed performs excessively poorly, the background GC is not doing you favor here. First thing I noted is that Parallel.ForEach() is using too many tasks. The threadpool manager misinterprets the thread behavior as "bogged down by I/O" and starts extra threads. This makes the problem worse. Workaround for that is:

var options = new ParallelOptions();
options.MaxDegreeOfParallelism = Environment.ProcessorCount;

Parallel.ForEach(dataFrame, options, dr => {
    // etc..
}

This gives better insight in what ails the program from the new diagnostics hub in VS2015. It doesn't take long for only a single core doing any work, easy to tell from the CPU usage. With occasional spikes, they don't last very long, coinciding with an orange GC mark. When you take a closer look at the GC mark you see it is a gen #1 collection. Taking a very long time, about 6 seconds on my machine.

A gen #1 collection of course doesn't take that long, what you see happening here is the gen #1 collection waiting for the background GC to finish its job. In other words, it is actually the background GC that's taking 6 seconds. Background GC can only be effective if the space in the gen #0 and gen #1 segments is large enough to not require a gen #2 collection while the background GC is trundling. Not the way this app works, it eats memory at a very high rate. The little spike you see is multiple tasks getting unblocked, being able to allocate arrays again. Quickly grinding to a halt when a gen #1 collection has to wait for the background GC again.

Notable is that the allocation pattern of this code is very unfriendly to the GC. It interleaves long-lived arrays (dr.DerivedValues) with short-lived arrays (tempArray). Giving the GC lots of work when it compacts the heap, every single allocated array is going to end up getting moved.

The apparent flaw in the .NET 4.6 GC is that the background collection never seems to effectively compact the heap. It looks like it does the job over and over again, as though the previous collection didn't compact at all. Whether this is by design or a bug is hard to tell, I don't have a clean 4.5 machine anymore. I'm certainly leaning towards bug. You should report this problem at connect.microsoft.com to have Microsoft take a look at it.


A workaround is very easy to come by, all you have to do is prevent the awkward inter-leaving of long- and short-lived objects. Which you do by pre-allocating them:

    for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { 
        Id = i, Value = r.NextDouble(), 
        DerivedValues = new double[tempArraySize] });

    ...
    Parallel.ForEach(dataFrame, options, dr => {
        var array = dr.DerivedValues;
        for (int j = 0; j < array.Length; j++) array[j] = Math.Pow(dr.Value, j);
        dr.DerivedValuesSum = array.Sum();
    });

And of course by disabling background GC completely.


UPDATE: GC bug confirmed in this blog post. Fix coming soon.


UPDATE: a hotfix was released.


UPDATE: fixed in .NET 4.6.1

We (and other users) have encountered a similar problem. We worked around it by disabling background GC in the application's app.config. Please see discussion in comments of https://connect.microsoft.com/VisualStudio/Feedback/Details/1594775.

app.config for gcConcurrent (non-concurrent workstation GC)

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.1" />
    </startup>
<runtime>
    <gcConcurrent enabled="false" />
</runtime>

You can also switch to the server GC, although this approach seems to use more memory (on an unsaturated machine?).

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.1" />
    </startup>
<runtime>
    <gcServer enabled="true" />
</runtime>
</configuration>
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!