Generic vs not-generic performance in C#

后端 未结 8 1569
旧巷少年郎
旧巷少年郎 2020-12-25 13:21

I\'ve written two equivalent methods:

static bool F(T a, T b) where T : class
{
    return a == b;
}

static bool F2(A a, A b)
{
    return a == b;
         


        
相关标签:
8条回答
  • 2020-12-25 13:26

    It seems more fair, no?:D

    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;
    
    namespace ConsoleApplication58
    {
        internal class Program
        {
            private class A
            {
    
            }
    
            private static bool F<T>(T a, T b) where T : class
            {
                return a == b;
            }
    
            private static bool F2(A a, A b)
            {
                return a == b;
            }
    
            private static void Main()
            {
                const int rounds = 100, n = 10000000;
                var a = new A();
                var fList = new List<TimeSpan>();
                var f2List = new List<TimeSpan>();
                for (int i = 0; i < rounds; i++)
                {
                    //test generic
                    GCClear();
                    bool res;
                    var sw = new Stopwatch();
                    sw.Start();
                    for (int j = 0; j < n; j++)
                    {
                        res = F(a, a);
                    }
                    sw.Stop();
                    fList.Add(sw.Elapsed);
    
                    //test not-generic
                    GCClear();
                    bool res2;
                    var sw2 = new Stopwatch();
                    sw2.Start();
                    for (int j = 0; j < n; j++)
                    {
                        res2 = F2(a, a);
                    }
                    sw2.Stop();
                    f2List.Add(sw2.Elapsed);
                }
                double f1AverageTicks = fList.Average(ts => ts.Ticks);
                Console.WriteLine("Elapsed for F = {0} \t ticks = {1}", fList.Average(ts => ts.TotalMilliseconds),
                                  f1AverageTicks);
                double f2AverageTicks = f2List.Average(ts => ts.Ticks);
                Console.WriteLine("Elapsed for F2 = {0} \t ticks = {1}", f2List.Average(ts => ts.TotalMilliseconds),
                      f2AverageTicks);
                Console.WriteLine("Not-generic method is {0} times faster, or on {1}%", f1AverageTicks/f2AverageTicks,
                                  (f1AverageTicks/f2AverageTicks - 1)*100);
                Console.ReadKey();
            }
    
            private static void GCClear()
            {
                GC.Collect();
                GC.WaitForPendingFinalizers();
                GC.Collect();
            }
        }
    }
    

    On my laptop i7-3615qm, generic is faster than not-generic.

    See http://ideone.com/Y1GIJK.

    0 讨论(0)
  • 2020-12-25 13:35

    Your testing method is flawed. There are a few big problems with how you did it.

    First, you did not provide a "warm-up". In .NET the first time you access something it will be slower than subsequent calls so it can load up any needed assemblies. If you are going to perform tests like this you must do each function at least once or the first test to run will suffer a large penalty. Go ahead and swap the order, you will likely see the opposite results.

    Second DateTime is only accurate to 16ms, so when comparing two times you have a +/- error of 32 ms. The difference between the two results are 21 ms, well within the experimental error. You must use a more accurate timer like the Stopwatch class.

    Lastly, don't do artificial tests like this. They don't show you any useful information other than bragging rights for one class or another. Instead learn to use a Code Profiler. This will show you what is slowing down your code and you can make informed decisions on how to solve the problem instead of "guessing" that not using a templated class will make your code faster.

    Here is a example code that shows how it "should" be done:

    using System;
    using System.Diagnostics;
    
    namespace Sandbox_Console
    {
        class A
        {
        }
    
        internal static class Program
        {
            static bool F<T>(T a, T b) where T : class
            {
                return a == b;
            }
    
            static bool F2(A a, A b)
            {
                return a == b;
            }
    
            private static void Main()
            {
                var a = new A();
                Stopwatch st = new Stopwatch();
    
                Console.WriteLine("warmup");
                st.Start();
                for (int i = 0; i < 100000000; i++)
                    F<A>(a, a);
                Console.WriteLine(st.Elapsed);
    
                st.Restart();
                for (int i = 0; i < 100000000; i++)
                    F2(a, a);
                Console.WriteLine(st.Elapsed);
    
                Console.WriteLine("real");
                st.Restart();
                for (int i = 0; i < 100000000; i++)
                    F<A>(a, a);
                Console.WriteLine(st.Elapsed);
    
                st.Restart();
                for (int i = 0; i < 100000000; i++)
                    F2(a, a);
                Console.WriteLine(st.Elapsed);
    
                Console.WriteLine("Done");
                Console.ReadLine();
            }
    
        }
    }
    

    And here are the results:

    warmup
    00:00:00.0297904
    00:00:00.0298949
    real
    00:00:00.0296838
    00:00:00.0297823
    Done
    

    Swapping the order of the last two the first one is always shorter, so effectively they are the "same time" as it is within the experimental error.

    0 讨论(0)
  • 2020-12-25 13:37

    I've done performance analysis in a professional capacity several times in my career, and have a couple of observations.

    • First, the test is way too short to be valid. My rule of thumb is that a performance test should run for 30 minutes or so.
    • Second, it's important to run the test many times, to get a range of timings.
    • Third, I'm surprised the compiler didn't optimize away the loops, since the function results are not used and the functions being called have no side effects.
    • Fourth, micro benchmarks are often misleading.

    I once worked on a compiler team that had a big audacious performance goal. One build introduced an optimization that eliminated several instructions for a particular sequence. It should have improved performance, but instead the performance of one benchmark fell dramatically. We were running on hardware with a direct mapped cache. It turns out that the code for the loop and the function called in the inner loop occupied the same cache line with the new optimization in place, but did not with the prior generated code. In other words, that benchmark was really a memory benchmark, and entirely dependent on memory cache hits and misses, whereas the authors thought they had written a computational benchmark.

    0 讨论(0)
  • 2020-12-25 13:44

    I did make some changes to your code to measure perf correctly.

    1. Use Stopwatch
    2. Execute Release Mode
    3. Prevent Inlining.
    4. Use GetHashCode() to do some real work
    5. Look at the generated Assembly code

    Here is the code:

    class A
    {
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    static bool F<T>(T a, T b) where T : class
    {
        return a.GetHashCode() == b.GetHashCode();
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    static bool F2(A a, A b)
    {
        return a.GetHashCode() == b.GetHashCode();
    }
    
    static int Main(string[] args)
    {
        const int Runs = 100 * 1000 * 1000;
        var a = new A();
        bool lret = F<A>(a, a);
        var sw = Stopwatch.StartNew();
        for (int i = 0; i < Runs; i++)
        {
            F<A>(a, a);
        }
        sw.Stop();
        Console.WriteLine("Generic: {0:F2}s", sw.Elapsed.TotalSeconds);
    
        lret = F2(a, a);
        sw = Stopwatch.StartNew();
        for (int i = 0; i < Runs; i++)
        {
            F2(a, a);
        }
        sw.Stop();
        Console.WriteLine("Non Generic: {0:F2}s", sw.Elapsed.TotalSeconds);
    
        return lret ? 1 : 0;
    }
    

    During my tests the non generic version was slightly faster (.NET 4.5 x32 Windows 7). But there is practically no measurable difference in speed. I would say the are both equal. For completeness here is the assembly code of the generic version: I got the assembly code via the debugger in Release mode with JIT optimizations enabled.The default is to disable JIT optimizations during debugging to make setting breakpoints and variables inspection easier.

    Generic

    static bool F<T>(T a, T b) where T : class
    {
            return a.GetHashCode() == b.GetHashCode();
    }
    
    push        ebp 
    mov         ebp,esp 
    push        ebx 
    sub         esp,8 // reserve stack for two locals 
    mov         dword ptr [ebp-8],ecx // store first arg on stack
    mov         dword ptr [ebp-0Ch],edx // store second arg on stack
    mov         ecx,dword ptr [ebp-8] // get first arg from stack --> stupid!
    mov         eax,dword ptr [ecx]   // load MT pointer from a instance
    mov         eax,dword ptr [eax+28h] // Locate method table start
    call        dword ptr [eax+8] //GetHashCode // call GetHashCode function pointer which is the second method starting from the method table
    mov         ebx,eax           // store result in ebx
    mov         ecx,dword ptr [ebp-0Ch] // get second arg
    mov         eax,dword ptr [ecx]     // call method as usual ...
    mov         eax,dword ptr [eax+28h] 
    call        dword ptr [eax+8] //GetHashCode
    cmp         ebx,eax 
    sete        al 
    movzx       eax,al 
    lea         esp,[ebp-4] 
    pop         ebx 
    pop         ebp 
    ret         4 
    

    Non Generic

    static bool F2(A a, A b)
    {
      return a.GetHashCode() == b.GetHashCode();
    }
    
    push        ebp 
    mov         ebp,esp 
    push        esi 
    push        ebx 
    mov         esi,edx 
    mov         eax,dword ptr [ecx] 
    mov         eax,dword ptr [eax+28h] 
    call        dword ptr [eax+8] //GetHashCode
    mov         ebx,eax 
    mov         ecx,esi 
    mov         eax,dword ptr [ecx] 
    mov         eax,dword ptr [eax+28h] 
    call        dword ptr [eax+8] //GetHashCode
    cmp         ebx,eax 
    sete        al 
    movzx       eax,al 
    pop         ebx 
    pop         esi 
    pop         ebp 
    ret 
    

    As you can see the generic version looks slightly more inefficient due to more stack memoy operations which are not perfect but in reality the difference is not measurable since all is fitting into the L1 cache of the processor which makes the memory operations less costly compared to the pure register operations of the non generic version. I would suspect that the non generic version should perform a little better in real world if you need to pay for real memory access not coming from any CPU cache.

    For all practical purposes these both methods are identical. You should look at some other place for real world performance gains. I would first look at the data access patterns and used data structures. Algorithmic changes tend to bring much more perf gain than such low level stuff.

    Edit1: If you want to use == then you will find

    00000000  push        ebp 
    00000001  mov         ebp,esp 
    00000003  cmp         ecx,edx // Check for reference equality 
    00000005  sete        al 
    00000008  movzx       eax,al 
    0000000b  pop         ebp 
    0000000c  ret         4 
    

    both methods produce exactly the same machine code. Any difference you did measure are your measurement errors.

    0 讨论(0)
  • 2020-12-25 13:44

    Two things:

    1. You are benchmarking with DateTime.Now. Use Stopwatch instead.
    2. You are running code that is not under normal circumstances. JIT is most likely affecting the first run, making your first method slower.

    If you switch the order of your tests (i.e. test the non-generic method first), does your result reverse? I would suspect so. When I plugged your code into LINQPad, and then copied it so that it ran both tests twice, the execution times for the second iteration were within a few hundred ticks of each other.

    So, in answer to your question: yes, someone knows why. It's because your benchmark is inaccurate!

    0 讨论(0)
  • 2020-12-25 13:45

    I think it's the answer for my question, but what magic can I use to deny boxing?

    If your goal is only to compare, you can do this:

        public class A : IEquatable<A> {
            public bool Equals( A other ) { return this == other; }
        }
        static bool F<T>( IEquatable<T> a, IEquatable<T> b ) where T : IEquatable<T> {
            return a==b;
        }
    

    This will avoid the boxing.

    As for the major timing deviation, I think everyone already established there was a problem with how you set up the stopwatch. I use a different technique where if I want to remove the loop itself from the time result, I take an empty baseline and just subtract that from the difference of time. It's not perfect, but it produces a fair result and doesn't slow from starting and stopping the timer over and over.

    0 讨论(0)
提交回复
热议问题