Why is my Python NumPy code faster than C++?

后端未结

关注

 3  951

隐瞒了意图╮ 2021-01-01 03:54

Why is this Python NumPy code,

import numpy as np
import time

k_max = 40000
N = 10000

data = np.zeros((2,N))
coefs = np.zeros((k_max,2),dtype=float)

t1 = t


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   [愿得一人]
                                             
                
                
                (楼主)
            
              
              
                2021-01-01 04:39
              

            
            
                        
I found this question interesting, because every time I encountered similar topic about the speed of NumPy (compared to C/C++) there was always answers like "it's a thin wrapper, its core is written in C, so it's fats", but this doesn't explain why C should be slower than C with additional layer (even a thin one).
The answer is: your C++ code is not slower than your Python code when properly compiled.
I've done some benchmarks, and at first it seemed that NumPy is surprisingly faster. But I forgot about optimizing the compilation with GCC.
I've computed everything again and also compared results with a pure C version of your code. I am using GCC version 4.9.2, and Python 2.7.9 (compiled from the source with the same GCC). To compile your C++ code I used g++ -O3 main.cpp -o main, to compile my C code I used gcc -O3 main.c -lm -o main. In all examples I filled data variables with some numbers (0.1, 0.4), as it changes results. I also changed np.arrays to use doubles (dtype=np.float64), because there are doubles in C++ example. My pure C version of your code (it's similar):
#include 
#include 
#include 

const int k_max = 100000;
const int N = 10000;

int main(void)
{
    clock_t t_start, t_end;
    double data1[N], data2[N], coefs1[k_max], coefs2[k_max], seconds;
    int z;
    for( z = 0; z < N; z++ )
    {
        data1[z] = 0.1;
        data2[z] = 0.4;
    }

    int i, j;
    t_start = clock();
    for( i = 0; i < k_max; i++ )
    {
        for( j = 0; j < N-1; j++ )
        {
            coefs1[i] += data2[j] * (cos((i+1) * data1[j]) - cos((i+1) * data1[j+1]));
            coefs2[i] += data2[j] * (sin((i+1) * data1[j]) - sin((i+1) * data1[j+1]));
        }
    }
    t_end = clock();

    seconds = (double)(t_end - t_start) / CLOCKS_PER_SEC;
    printf("Time: %f s\n", seconds);
    return coefs1[0];
}

For k_max = 100000, N = 10000 results where following:

Python 70.284362 s
C++    69.133199 s
C      61.638186 s

Python and C++ have basically the same time, but note that there is a Python loop of length k_max, which should be much slower compared to C/C++ one. And it is.
For k_max = 1000000, N = 1000 we have:

Python 115.42766 s
C++    70.781380 s

For k_max = 1000000, N = 100:

Python 52.86826 s
C++    7.050597 s

So the difference increases with fraction k_max/N, but python is not faster even for N much bigger than k_max, e. g. k_max = 100, N = 100000:

Python 0.651587 s
C++    0.568518 s

Obviously, the main speed difference between C/C++ and Python is in the for loop. But I wanted to find out the difference between simple operations on arrays in NumPy and in C. Advantages of using NumPy in your code consists of: 1. multiplying the whole array by a number, 2. calculating sin/cos of the whole array, 3. summing all elements of the array, instead of doing those operations on every single item separately. So I prepared two scripts to compare only these operations.
Python script:
import numpy as np
from time import time

N = 10000
x_len = 100000

def main():
    x = np.ones(x_len, dtype=np.float64) * 1.2345

    start = time()
    for i in xrange(N):
        y1 = np.cos(x, dtype=np.float64)
    end = time()
    print('cos: {} s'.format(end-start))

    start = time()
    for i in xrange(N):
        y2 = x * 7.9463
    end = time()
    print('multi: {} s'.format(end-start))

    start = time()
    for i in xrange(N):
        res = np.sum(x, dtype=np.float64)
    end = time()
    print('sum: {} s'.format(end-start))

    return y1, y2, res

if __name__ == '__main__':
    main()

# results
# cos: 22.7199969292 s
# multi: 0.841291189194 s
# sum: 1.15971088409 s

C script:
#include 
#include 
#include 

const int N = 10000;
const int x_len = 100000;

int main()
{
    clock_t t_start, t_end;
    double x[x_len], y1[x_len], y2[x_len], res, time;
    int i, j;
    for( i = 0; i < x_len; i++ )
    {
        x[i] = 1.2345;
    }

    t_start = clock();
    for( j = 0; j < N; j++ )
    {
        for( i = 0; i < x_len; i++ )
        {
            y1[i] = cos(x[i]);
        }
    }
    t_end = clock();
    time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
    printf("cos: %f s\n", time);

    t_start = clock();
    for( j = 0; j < N; j++ )
    {
        for( i = 0; i < x_len; i++ )
        {
            y2[i] = x[i] * 7.9463;
        }
    }
    t_end = clock();
    time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
    printf("multi: %f s\n", time);

    t_start = clock();
    for( j = 0; j < N; j++ )
    {
        res = 0.0;
        for( i = 0; i < x_len; i++ )
        {
            res += x[i];
        }
    }
    t_end = clock();
    time = (double)(t_end - t_start) / CLOCKS_PER_SEC;
    printf("sum: %f s\n", time);

    return y1[0], y2[0], res;
}

// results
// cos: 20.910590 s
// multi: 0.633281 s
// sum: 1.153001 s

Python results:

cos: 22.7199969292 s
multi: 0.841291189194 s
sum: 1.15971088409 s

C results:

cos: 20.910590 s
multi: 0.633281 s
sum: 1.153001 s

As you can see NumPy is incredibly fast, but always a bit slower than pure C.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复