Is there any good way to optimize the speed of this python code?

问题

I have a following piece of code, which basically evaluates some numerical expression, and use it to integrate over certain range of values. The current piece of code runs within about 8.6 s, but I am just using mock values, and my actual array is much larger. Especially, my actual size of freq_c= (3800, 101) and size of number_bin = (3800, 100), which makes the following code really inefficient, as the total execution time will be close to 9 minutes for the actual array. One part of the code that is quite slow is evaluation of k_one_third and k_two_third, for which I have also used numexpr.evaluate(".."), which speeds up the code quite a bit by about 10-20%. But, I have avoided numexpr below, so that anyone can run it without having to install the package. Is there any more ways to improve the speed of this code? An improvement of a few factor would also be good enough. Please note that the for loop is almost unavoidable, due to memory issues, as the arrays are really large, I am manipulating each axis at a time through the loop. I also wonder if numba jit optimisation is possible here.

import numpy as np
import scipy 
from scipy.integrate import simps as simps
import time

def k_one_third(x):
    return (2.*np.exp(-x**2)/x**(1/3) + 4./x**(1/6)*np.exp(-x)/(1+x**(1/3)))**2

def k_two_third(x):
    return (np.exp(-x**2)/x**(2/3) + 2.*x**(5/2)*np.exp(-x)/(6.+x**3))**2

def spectrum(freq_c, number_bin, frequency, gamma, theta):
    theta_gamma_factor = np.einsum('i,j->ij', theta**2, gamma**2)
    theta_gamma_factor += 1.
    t_g_bessel_factor = 1.-1./theta_gamma_factor
    number = np.concatenate((number_bin, np.zeros((number_bin.shape[0], 1), dtype=number_bin.dtype)), axis=1)
    number_theta_gamma = np.einsum('jk, ik->ijk', theta_gamma_factor**2*1./gamma**3, number)
    final = np.zeros((np.size(freq_c[:,0]), np.size(theta), np.size(frequency)))
    for i in xrange(np.size(frequency)):
        b_n_omega_theta_gamma = frequency[i]**2*number_theta_gamma
        eta = theta_gamma_factor**(1.5)*frequency[i]/2.
        eta = np.einsum('jk, ik->ijk', eta, 1./freq_c)
        bessel_eta = np.einsum('jl, ijl->ijl',t_g_bessel_factor, k_one_third(eta))
        bessel_eta += k_two_third(eta)
        eta = None
        integrand = np.multiply(bessel_eta, b_n_omega_theta_gamma, out= bessel_eta)
        final[:,:, i] = simps(integrand, gamma)
        integrand = None
    return final

frequency = np.linspace(1, 100, 100)
theta = np.linspace(1, 3, 100)
gamma = np.linspace(2, 200, 101)
freq_c = np.random.randint(1, 200, size=(50, 101))
number_bin = np.random.randint(1, 100, size=(50, 100))
time1 = time.time()
spectra = spectrum(freq_c, number_bin, frequency, gamma, theta)
print(time.time()-time1)

回答1:

As said in the comments large parts of the code should be rewritten to get best performance.

I have only modified the simpson integration and modified @HYRY answer a bit. This speeds up the calculation from 26.15s to 1.76s (15x), by the test-data you provided. By replacing the np.einsums with simple loops this should end up in less than a second. (About 0.4s from the improved integration, 24s from k_one_two_third(x))

For getting performance using Numba read. The latest Numba version (0.39), the Intel SVML-package and things like fastmath=True makes quite a big impact on your example.

Code

#a bit faster than HYRY's version
@nb.njit(parallel=True,fastmath=True,error_model='numpy')
def k_one_two_third(x):
  one=np.empty(x.shape,dtype=x.dtype)
  two=np.empty(x.shape,dtype=x.dtype)
  for i in nb.prange(x.shape[0]):
    for j in range(x.shape[1]):
      for k in range(x.shape[2]):
        x0 = x[i,j,k] ** (1/3)
        x1 = np.exp(-x[i,j,k] ** 2)
        x2 = np.exp(-x[i,j,k])
        one[i,j,k] = (2*x1/x0 + 4*x2/(x[i,j,k]**(1/6)*(x0 + 1)))**2
        two[i,j,k] = (2*x[i,j,k]**(5/2)*x2/(x[i,j,k]**3 + 6) + x1/x[i,j,k]**(2/3))**2
  return one, two

#improved integration
@nb.njit(fastmath=True)
def simpson_nb(y_in,dx):
  s = y[0]+y[-1]

  n=y.shape[0]//2
  for i in range(n-1):
    s += 4.*y[i*2+1]
    s += 2.*y[i*2+2]

  s += 4*y[(n-1)*2+1]
  return(dx/ 3.)*s

@nb.jit(fastmath=True)
def spectrum(freq_c, number_bin, frequency, gamma, theta):
    theta_gamma_factor = np.einsum('i,j->ij', theta**2, gamma**2)
    theta_gamma_factor += 1.
    t_g_bessel_factor = 1.-1./theta_gamma_factor
    number = np.concatenate((number_bin, np.zeros((number_bin.shape[0], 1), dtype=number_bin.dtype)), axis=1)
    number_theta_gamma = np.einsum('jk, ik->ijk', theta_gamma_factor**2*1./gamma**3, number)
    final = np.empty((np.size(frequency),np.size(freq_c[:,0]), np.size(theta)))

    #assume that dx is const. on integration
    #speedimprovement of the scipy.simps is about 4x
    #numba version to scipy.simps(y,x) is about 60x
    dx=gamma[1]-gamma[0]

    for i in range(np.size(frequency)):
        b_n_omega_theta_gamma = frequency[i]**2*number_theta_gamma
        eta = theta_gamma_factor**(1.5)*frequency[i]/2.
        eta = np.einsum('jk, ik->ijk', eta, 1./freq_c)

        one,two=k_one_two_third(eta)

        bessel_eta = np.einsum('jl, ijl->ijl',t_g_bessel_factor, one)
        bessel_eta += two

        integrand = np.multiply(bessel_eta, b_n_omega_theta_gamma, out= bessel_eta)

        #reorder array
        for j in range(integrand.shape[0]):
          for k in range(integrand.shape[1]):
            final[i,j, k] = simpson_nb(integrand[j,k,:],dx)
    return final

回答2:

I profiled the code and found that k_one_third() and k_two_third() are slow. There are some duplicated calculations in the two functions.

By merging the two functions into one function, and decorate it with @numba.jit(parallel=True), I got 4x speedup.

@jit(parallel=True)
def k_one_two_third(x):
    x0 = x ** (1/3)
    x1 = np.exp(-x ** 2)
    x2 = np.exp(-x)
    one = (2*x1/x0 + 4*x2/(x**(1/6)*(x0 + 1)))**2
    two = (2*x**(5/2)*x2/(x**3 + 6) + x1/x**(2/3))**2
    return one, two

来源：https://stackoverflow.com/questions/50440592/is-there-any-good-way-to-optimize-the-speed-of-this-python-code

标签

python

arrays

performance

numpy

numba