Performance: Matlab vs Python

前端 未结 5 1900
长情又很酷
长情又很酷 2020-11-29 13:06

I recently switched from Matlab to Python. While converting one of my lengthy codes, I was surprised to find Python being very slow. I

相关标签:
5条回答
  • 2020-11-29 13:22

    You want to get rid of those for loops. Try this:

    def exampleKernelA(M, x, N, y):
        """Example kernel function A"""
        i, j = np.indices((N, M))
        # Define the custom kernel function here
        kernel[i, j] = np.sqrt((x[i, 0] - y[j, 0]) ** 2 + (x[i, 1] - y[j, 1]) ** 2)
        return kernel
    

    You can also do it with broadcasting, which may be even faster, but a little less intuitive coming from MATLAB.

    0 讨论(0)
  • 2020-11-29 13:22

    I got ~5x speed improvement over the meshgrid solution using only broadcasting:

    def exampleKernelD(M, x, N, y):
        return np.sqrt((x[:,1:] - y[:,1:].T) ** 2 + (x[:,:1] - y[:,:1].T) ** 2)
    
    0 讨论(0)
  • 2020-11-29 13:27

    Comparing Jit-Compilers

    It has been mentioned that Matlab uses an internal Jit-compiler to get good performance on such tasks. Let's compare Matlabs jit-compiler with a Python jit-compiler (Numba).

    Code

    import numba as nb
    import numpy as np
    import math
    import time
    
    #If the arrays are somewhat larger it makes also sense to parallelize this problem
    #cache ==True may also make sense
    @nb.njit(fastmath=True) 
    def exampleKernelA(M, x, N, y):
      """Example kernel function A"""
      #explicitly declaring the size of the second dim also improves performance a bit
      assert x.shape[1]==2
      assert y.shape[1]==2
    
      #Works with all dtypes, zeroing isn't necessary
      kernel = np.empty((M,N),dtype=x.dtype)
      for i in range(M):
        for j in range(N):
          # Define the custom kernel function here
          kernel[i, j] = np.sqrt((x[i, 0] - y[j, 0]) ** 2 + (x[i, 1] - y[j, 1]) ** 2)
      return kernel
    
    
    def exampleKernelB(M, x, N, y):
        """Example kernel function A"""
        # Euclidean norm function implemented using meshgrid idea.
        # Fastest
        x0, y0 = np.meshgrid(y[:, 0], x[:, 0])
        x1, y1 = np.meshgrid(y[:, 1], x[:, 1])
        # Define custom kernel here
        kernel = np.sqrt((x0 - y0) ** 2 + (x1 - y1) ** 2)
        return kernel
    
    @nb.njit() 
    def exampleKernelC(M, x, N, y):
      """Example kernel function A"""
      #explicitly declaring the size of the second dim also improves performance a bit
      assert x.shape[1]==2
      assert y.shape[1]==2
    
      #Works with all dtypes, zeroing isn't necessary
      kernel = np.empty((M,N),dtype=x.dtype)
      for i in range(M):
        for j in range(N):
          # Define the custom kernel function here
          kernel[i, j] = np.sqrt((x[i, 0] - y[j, 0]) ** 2 + (x[i, 1] - y[j, 1]) ** 2)
      return kernel
    
    
    #Your test data
    xVec = np.array([
        [49.7030,  78.9590],
        [42.6730,  11.1390],
        [23.2790,  89.6720],
        [75.6050,  25.5890],
        [81.5820,  53.2920],
        [44.9680,   2.7770],
        [38.7890,  78.9050],
        [39.1570,  33.6790],
        [33.2640,  54.7200],
        [4.8060 ,  44.3660],
        [49.7030,  78.9590],
        [42.6730,  11.1390],
        [23.2790,  89.6720],
        [75.6050,  25.5890],
        [81.5820,  53.2920],
        [44.9680,   2.7770],
        [38.7890,  78.9050],
        [39.1570,  33.6790],
        [33.2640,  54.7200],
        [4.8060 ,  44.3660]
        ])
    
    #compilation on first callable
    #can be avoided with cache=True
    res=exampleKernelA(xVec.shape[0], xVec, xVec.shape[0], xVec)
    res=exampleKernelC(xVec.shape[0], xVec, xVec.shape[0], xVec)
    
    t1=time.time()
    for i in range(10_000):
      res=exampleKernelA(xVec.shape[0], xVec, xVec.shape[0], xVec)
    
    print(time.time()-t1)
    
    t1=time.time()
    for i in range(10_000):
      res=exampleKernelC(xVec.shape[0], xVec, xVec.shape[0], xVec)
    
    print(time.time()-t1)
    
    t1=time.time()
    for i in range(10_000):
      res=exampleKernelB(xVec.shape[0], xVec, xVec.shape[0], xVec)
    
    print(time.time()-t1)
    

    Performance

    exampleKernelA: 0.03s
    exampleKernelC: 0.03s
    exampleKernelB: 1.02s
    Matlab_2016b (your code, but 10000 rep., after few runs): 0.165s
    
    0 讨论(0)
  • 2020-11-29 13:28

    Matlab uses commerical MKL library. If you use free python distribution, check whether you have MKL or other high performance blas library used in python or it is the default ones, which could be much slower.

    0 讨论(0)
  • 2020-11-29 13:32

    Upon further investigation I have found that using indices as indicated in the answer is still slower.

    Solution: Use meshgrid

    def exampleKernelA(M, x, N, y):
        """Example kernel function A"""
        # Euclidean norm function implemented using meshgrid idea.
        # Fastest
        x0, y0 = meshgrid(y[:, 0], x[:, 0])
        x1, y1 = meshgrid(y[:, 1], x[:, 1])
        # Define custom kernel here
        kernel = sqrt((x0 - y0) ** 2 + (x1 - y1) ** 2)
        return kernel
    

    Result: Very very fast, 10 times faster than indices approach. I am getting times which are closer to C.

    However: Using meshgrid with Matlab beats C and Numpy by being 10 times faster than both.

    Still wondering why!

    0 讨论(0)
提交回复
热议问题