What is the effect of ordering if…else if statements by probability?

后端 未结 10 1811
花落未央
花落未央 2020-12-07 13:39

Specifically, if I have a series of if...else if statements, and I somehow know beforehand the relative probability that each statement will evalua

10条回答
  •  情书的邮戳
    2020-12-07 14:16

    Just my 5 cents. It seems the effect of ordering if statements should depend on:

    1. Probability of each if statement.

    2. Number of iterations, so the branch predictor could kick in.

    3. Likely/unlikely compiler hints, i.e. code layout.

    To explore those factors, I benchmarked the following functions:

    ordered_ifs()

    for (i = 0; i < data_sz * 1024; i++) {
        if (data[i] < check_point) // highly likely
            s += 3;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (data[i] == check_point) // very unlikely
            s += 1;
    }
    

    reversed_ifs()

    for (i = 0; i < data_sz * 1024; i++) {
        if (data[i] == check_point) // very unlikely
            s += 1;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (data[i] < check_point) // highly likely
            s += 3;
    }
    

    ordered_ifs_with_hints()

    for (i = 0; i < data_sz * 1024; i++) {
        if (likely(data[i] < check_point)) // highly likely
            s += 3;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (unlikely(data[i] == check_point)) // very unlikely
            s += 1;
    }
    

    reversed_ifs_with_hints()

    for (i = 0; i < data_sz * 1024; i++) {
        if (unlikely(data[i] == check_point)) // very unlikely
            s += 1;
        else if (data[i] > check_point) // samewhat likely
            s += 2;
        else if (likely(data[i] < check_point)) // highly likely
            s += 3;
    }
    

    data

    The data array contains random numbers between 0 and 100:

    const int RANGE_MAX = 100;
    uint8_t data[DATA_MAX * 1024];
    
    static void data_init(int data_sz)
    {
        int i;
            srand(0);
        for (i = 0; i < data_sz * 1024; i++)
            data[i] = rand() % RANGE_MAX;
    }
    

    The Results

    The following results are for Intel i5@3,2 GHz and G++ 6.3.0. The first argument is the check_point (i.e. probability in %% for the highly likely if statement), the second argument is data_sz (i.e. number of iterations).

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    ordered_ifs/50/8                   25636 ns      25635 ns      27852
    ordered_ifs/75/4                    4326 ns       4325 ns     162613
    ordered_ifs/75/8                   18242 ns      18242 ns      37931
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    ordered_ifs/100/8                   3381 ns       3381 ns     207612
    reversed_ifs/50/4                   5342 ns       5341 ns     126800
    reversed_ifs/50/8                  26050 ns      26050 ns      26894
    reversed_ifs/75/4                   3616 ns       3616 ns     193130
    reversed_ifs/75/8                  15697 ns      15696 ns      44618
    reversed_ifs/100/4                  3738 ns       3738 ns     188087
    reversed_ifs/100/8                  7476 ns       7476 ns      93752
    ordered_ifs_with_hints/50/4         5551 ns       5551 ns     125160
    ordered_ifs_with_hints/50/8        23191 ns      23190 ns      30028
    ordered_ifs_with_hints/75/4         3165 ns       3165 ns     218492
    ordered_ifs_with_hints/75/8        13785 ns      13785 ns      50574
    ordered_ifs_with_hints/100/4        1575 ns       1575 ns     437687
    ordered_ifs_with_hints/100/8        3130 ns       3130 ns     221205
    reversed_ifs_with_hints/50/4        6573 ns       6572 ns     105629
    reversed_ifs_with_hints/50/8       27351 ns      27351 ns      25568
    reversed_ifs_with_hints/75/4        3537 ns       3537 ns     197470
    reversed_ifs_with_hints/75/8       16130 ns      16130 ns      43279
    reversed_ifs_with_hints/100/4       3737 ns       3737 ns     187583
    reversed_ifs_with_hints/100/8       7446 ns       7446 ns      93782
    

    Analysis

    1. The Ordering Does Matter

    For 4K iterations and (almost) 100% probability of highly liked statement the difference is huge 223%:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    reversed_ifs/100/4                  3738 ns       3738 ns     188087
    

    For 4K iterations and 50% probability of highly liked statement the difference is about 14%:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    reversed_ifs/50/4                   5342 ns       5341 ns     126800
    

    2. Number of Iterations Does Matter

    The difference between 4K and 8K iterations for (almost) 100% probability of highly liked statement is about two times (as expected):

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    ordered_ifs/100/8                   3381 ns       3381 ns     207612
    

    But the difference between 4K and 8K iterations for 50% probability of highly liked statement is 5,5 times:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    ordered_ifs/50/8                   25636 ns      25635 ns      27852
    

    Why is so? Because of branch predictor misses. Here is the branch misses for each mentioned above case:

    ordered_ifs/100/4    0.01% of branch-misses
    ordered_ifs/100/8    0.01% of branch-misses
    ordered_ifs/50/4     3.18% of branch-misses
    ordered_ifs/50/8     15.22% of branch-misses
    

    So on my i5 the branch predictor fails spectacularly for not-so-likely branches and large data sets.

    3. Hints Help a Bit

    For 4K iterations the results are somewhat worse for 50% probability and somewhat better for close to 100% probability:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/4                    4660 ns       4658 ns     150948
    ordered_ifs/100/4                   1673 ns       1673 ns     417073
    ordered_ifs_with_hints/50/4         5551 ns       5551 ns     125160
    ordered_ifs_with_hints/100/4        1575 ns       1575 ns     437687
    

    But for 8K iterations the results are always a bit better:

    ---------------------------------------------------------------------
    Benchmark                              Time           CPU Iterations
    ---------------------------------------------------------------------
    ordered_ifs/50/8                   25636 ns      25635 ns      27852
    ordered_ifs/100/8                   3381 ns       3381 ns     207612
    ordered_ifs_with_hints/50/8        23191 ns      23190 ns      30028
    ordered_ifs_with_hints/100/8        3130 ns       3130 ns     221205
    

    So, the hints also help, but just a tiny bit.

    Overall conclusion is: always benchmark the code, because the results may surprise.

    Hope that helps.

提交回复
热议问题