String sorting performance degradation in VS2010 vs. VS2008

前端 未结 1 788
小鲜肉
小鲜肉 2020-12-10 01:44

The following C# code seems to run slower when built with VS2010 than with VS2008: on a Core i5 Win7 x64 8 GB RAM PC, the VS2008 built version sorts strings in abou

相关标签:
1条回答
  • 2020-12-10 02:29

    Here is a brief outline of sorting algorithms used in .NET versions. It's helpful to remember that List<T>.Sort() internally uses Array<T>.Sort()

    • In .NET 2.0-4.0, a quick sort algorithm is used to sort an Array. There have been minor changes to the code, but for the most part, the code remains the same.
    • In .NET 4.5, the array sorting algorithm changed from quick sort to an introspective sort. This is a larger change than from before, one that, at least in my tests, shows considerable performance improvements.

    Did the sorting algorithm change in VS2010?

    Yes, but the changes were minor, and doesn't affect performance. Consider a sort against 20 million shuffled integers1:

    List<int>.Sort() (20 million)
    
    .NET 3.5       .NET 4.0       .NET 4.5
    ---------      ---------      ---------
     2.564s         2.565s         2.337s
    

    There's no change between v3.5 and v4.0 in terms of performance. There is a noticeable increase in speed for v4.5. It's clear that it's not the actual sorting algorithm that is making the difference.

    Before we jump into your next question, let me share my results of running your actual code on my machine:

    List<string>.Sort() (1.6 million)
    
    .NET 3.5       .NET 4.0       .NET 4.5
    ---------      ---------      ---------
     7.953s         11.267s        10.092s
    

    I get similar results, as you do. These results are a good lead-in to your next question:

    Is there anything different in the underlying CLR that makes the performance worse?

    Without a doubt. So, what is the difference? The difference is in string comparison implementation. In each step of the sorting algorithm it needs to compare the two strings, and it happens to do it differently between the v2.0 and v4.0 runtime. (See extra notes below)

    The easiest way to prove this is to force sorting by ordinal position, instead of culture dependence. Replace lines.Sort(); with lines.Sort(StringComparer.Ordinal);. Here is what I measured:

    List<string>.Sort(StringComparer.Ordinal) (1.6 million)
    
    .NET 3.5       .NET 4.0       .NET 4.5
    ---------      ---------      ---------
     4.088s         3.76s          3.454s
    

    Now, that looks better! It's more or less what I expected; a steady increase in speed for each version of the framework released. MSDN Suggests that if you're ever doing a non-linguistic comparison on a string, you should use an ordinal comparison.

    However, that only solves the problem if your comparison or sorting isn't culture-sensitive. If you need culture-sensitive sorting, it seems you won't be able to get rid of the slower execution time unless you want to revert to the .NET 3.5 framework.


    Extra notes

    When you don't pass a comparer to List<T>.Sort() or Array.Sort, it will use the default comparer. Default comparers for .NET strings uses the comparer from the Thread's current culture. From there, it calls some internal functions in the .NET runtime native libraries.

    In v2.0-3.5, it calls COMNlsInfo::Compare and COMNlsInfo::CompareFast. Here's what the call stack (kinda) looks like:

    String.CompareTo(string)
    +--System.Globalization.CompareInfo.Compare(string,string,CompareOptions)
       +--mscorwks.dll!COMNlsInfo::Compare
          +--mscorwks.dll!COMNlsInfo::CompareFast
    

    Similar source for these functions is visible in the shared source implementation of the CLI (SSCLI). It's located in sscli\clr\src\classlibnative\nls\comnlsinfo.cpp on lines 1034 and 893, respectively.

    In v4.0, however, that call tree changed fairly significantly:

    String.CompareTo(string)
    +--System.Globalization.CompareInfo.Compare(string,string,CompareOptions)
       +--clr.dll!COMNlsInfo::InternalCompareString
          +--clr.dll!SortVersioning::SortDllCompareString
             +--nlssorting.dll!_SortCompareString
                +--nlssorting.dll!_AsciiCompareString
    

    I wish I could tell you why one is slower than the other, but I have no clue at all and there is no SSCLI for .NET 4.0 to compare against. The major changes to string handling in .NET 4.0 weren't without problems. There have been performance issues related to strings in .NET 4.0, however they don't really apply here.


    1All tests were run in a virtual machine. Win 2008R2 x64 w/ 4GB RAM and a virtual quad-core processor. Host machine is Win7 x64 w/ 24GB RAM and a Xeon W3540 (2.93ghz) quad-core (8 Logical processors). Results are an average of 5 runs with the best and worst times removed.

    0 讨论(0)
提交回复
热议问题