Why is my Python NumPy code faster than C++?

后端 未结 3 943
隐瞒了意图╮
隐瞒了意图╮ 2021-01-01 03:54

Why is this Python NumPy code,

import numpy as np
import time

k_max = 40000
N = 10000

data = np.zeros((2,N))
coefs = np.zeros((k_max,2),dtype=float)

t1 = t         


        
3条回答
  •  [愿得一人]
    2021-01-01 04:39

    I found this question interesting, because every time I encountered similar topic about the speed of NumPy (compared to C/C++) there was always answers like "it's a thin wrapper, its core is written in C, so it's fats", but this doesn't explain why C should be slower than C with additional layer (even a thin one).

    The answer is: your C++ code is not slower than your Python code when properly compiled.

    I've done some benchmarks, and at first it seemed that NumPy is surprisingly faster. But I forgot about optimizing the compilation with GCC.

    I've computed everything again and also compared results with a pure C version of your code. I am using GCC version 4.9.2, and Python 2.7.9 (compiled from the source with the same GCC). To compile your C++ code I used g++ -O3 main.cpp -o main, to compile my C code I used gcc -O3 main.c -lm -o main. In all examples I filled data variables with some numbers (0.1, 0.4), as it changes results. I also changed np.arrays to use doubles (dtype=np.float64), because there are doubles in C++ example. My pure C version of your code (it's similar):

    #include 
    #include 
    #include 
    
    const int k_max = 100000;
    const int N = 10000;
    
    int main(void)
    {
        clock_t t_start, t_end;
        double data1[N], data2[N], coefs1[k_max], coefs2[k_max], seconds;
        int z;
        for( z = 0; z < N; z++ )
        {
            data1[z] = 0.1;
            data2[z] = 0.4;
        }
    
        int i, j;
        t_start = clock();
        for( i = 0; i < k_max; i++ )
        {
            for( j = 0; j < N-1; j++ )
            {
                coefs1[i] += data2[j] * (cos((i+1) * data1[j]) - cos((i+1) * data1[j+1]));
                coefs2[i] += data2[j] * (sin((i+1) * data1[j]) - sin((i+1) * data1[j+1]));
            }
        }
        t_end = clock();
    
        seconds = (double)(t_end - t_start) / CLOCKS_PER_SEC;
        printf("Time: %f s\n", seconds);
        return coefs1[0];
    }
    

    For k_max = 100000, N = 10000 results where following:

    • Python 70.284362 s
    • C++ 69.133199 s
    • C 61.638186 s

    Python and C++ have basically the same time, but note that there is a Python loop of length k_max, which should be much slower compared to C/C++ one. And it is.

    For k_max = 1000000, N = 1000 we have:

    • Python 115.42766 s
    • C++ 70.781380 s

    For k_max = 1000000, N = 100:

    • Python 52.86826 s
    • C++ 7.050597 s

    So the difference increases with fraction k_max/N, but python is not faster even for N much bigger than k_max, e. g. k_max = 100, N = 100000:

    • Python 0.651587 s
    • C++ 0.568518 s

    Obviously, the main speed difference between C/C++ and Python is in the for loop. But I wanted to find out the difference between simple operations on arrays in NumPy and in C. Advantages of using NumPy in your code consists of: 1. multiplying the whole array by a number, 2. calculating sin/cos of the whole array, 3. summing all elements of the array, instead of doing those operations on every single item separately. So I prepared two scripts to compare only these operations.

    Python script:

    import numpy as np
    from time import time
    
    N = 10000
    x_len = 100000
    
    def main():
        x = np.ones(x_len, dtype=np.float64) * 1.2345
    
        start = time()
        for i in xrange(N):
            y1 = np.cos(x, dtype=np.float64)
        end = time()
        print('cos: {} s'.format(end-start))
    
        start = time()
        for i in xrange(N):
            y2 = x * 7.9463
        end = time()
        print('multi: {} s'.format(end-start))
    
        start = time()
        for i in xrange(N):
            res = np.sum(x, dtype=np.float64)
        end = time()
        print('sum: {} s'.format(end-start))
    
        return y1, y2, res
    
    if __name__ == '__main__':
        main()
    
    # results
    # cos: 22.7199969292 s
    # multi: 0.841291189194 s
    # sum: 1.15971088409 s
    

    C script:

    #include 
    #include 
    #include 
    
    const int N = 10000;
    const int x_len = 100000;
    
    int main()
    {
        clock_t t_start, t_end;
        double x[x_len], y1[x_len], y2[x_len], res, time;
        int i, j;
        for( i = 0; i < x_len; i++ )
        {
            x[i] = 1.2345;
        }
    
        t_start = clock();
        for( j = 0; j < N; j++ )
        {
            for( i = 0; i < x_len; i++ )
            {
                y1[i] = cos(x[i]);
            }
        }
        t_end = clock();
        time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
        printf("cos: %f s\n", time);
    
        t_start = clock();
        for( j = 0; j < N; j++ )
        {
            for( i = 0; i < x_len; i++ )
            {
                y2[i] = x[i] * 7.9463;
            }
        }
        t_end = clock();
        time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
        printf("multi: %f s\n", time);
    
        t_start = clock();
        for( j = 0; j < N; j++ )
        {
            res = 0.0;
            for( i = 0; i < x_len; i++ )
            {
                res += x[i];
            }
        }
        t_end = clock();
        time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
        printf("sum: %f s\n", time);
    
        return y1[0], y2[0], res;
    }
    
    // results
    // cos: 20.910590 s
    // multi: 0.633281 s
    // sum: 1.153001 s
    

    Python results:

    • cos: 22.7199969292 s
    • multi: 0.841291189194 s
    • sum: 1.15971088409 s

    C results:

    • cos: 20.910590 s
    • multi: 0.633281 s
    • sum: 1.153001 s

    As you can see NumPy is incredibly fast, but always a bit slower than pure C.

提交回复
热议问题