问题
I'm currently playing in python with Runge-Kutta methods for differential equations systems numerical integration, and the scope is (as told in the title) the simulation of planetary orbits.
I'm investigating (comparing) the different ways to accelerate the calculations, and currently I've tried using a C module which quite efficient and I wanted to try with numpy
In this calculation, I need to compute mutual attraction for each pair of planets. Currently, I'm doing this :
import numpy as np
def grav(px, py, M, ax, ay):
G = 6.67408*10**-2 # m³/s²T
for b in range(1, len(px)):
# computing the distance between body #b and all previous
dx = px[b] - px[:b]
dy = py[b] - py[:b]
d2 = dx*dx+dy*dy
# computing acceleration undergone by b from all previous
ax[b] = -sum(M[:b]*G * dx * d2**(-1.5))
ay[b] = -sum(M[:b]*G * dy * d2**(-1.5))
# adding for each previous, acceleration undergone by from b
ax[:b] += M[b]*G * dx * d2**(-1.5)
ay[:b] += M[b]*G * dy * d2**(-1.5)
# input data
system_px = np.array([1., 2., 3., 4., 5., 6., 7., 9., 4., 0.])
system_py = np.array([3., 5., 1., 2., 4., 5., 6., 3., 5., 8.])
system_M = np.array([3., 5., 1., 2., 4., 5., 6., 3., 5., 8.])
# outout array
system_ax = np.zeros(len(system_px))
system_ay = np.zeros(len(system_px))
grav(system_px, system_py, system_M, system_ax, system_ay)
for i in range(len(system_px)):
print('body {} mass = {}(ton), position = {}(m), '
'acceleration = ({:8.4f}, {:8.4f})(m/s²)'.format(i, system_M[i],
(system_px[i], system_py[i]), system_ax[i], system_ay[i]))
I wondered if there would be some very general more «numpythonic» way to do this, which could apply to every subset of n lines.
回答1:
A Numba approach
There is not much to do to get quite a high speed up of your code.
- Join uneccessary loops (all vectorized commands are loops)
- Do some math:
d2**(-1.5)
is a very costly operation which can be replaced withd2 = 1./(np.sqrt(d2)*d2)
- Install Intel SVML to get a faster implementation for functions like
sin,sqrt,pow...
Code
import numba as nb
import numpy as np
def grav(px, py, M):
G = 6.67408*10**-2 # m³/s²T
nPoints=px.shape[0]
ax=np.zeros(nPoints,dtype=np.float64)
ay=np.zeros(nPoints,dtype=np.float64)
for b in range(1, px.shape[0]):
# computing the distance between body #b and all previous
dx = px[b] - px[:b]
dy = py[b] - py[:b]
d2 = dx*dx+dy*dy
# computing acceleration undergone by b from all previous
ax[b] = -np.sum(M[:b]*G * dx * d2**(-1.5))
ay[b] = -np.sum(M[:b]*G * dy * d2**(-1.5))
# adding for each previous, acceleration undergone by from b
ax[:b] += M[b]*G * dx * d2**(-1.5)
ay[:b] += M[b]*G * dy * d2**(-1.5)
return ax,ay
@nb.njit(fastmath=True,error_model="numpy")
def grav_2(px, py, M):
G = 6.67408*10**-2 # m³/s²T
nPoints=px.shape[0]
ax=np.zeros(nPoints,dtype=np.float64)
ay=np.zeros(nPoints,dtype=np.float64)
for b in range(1, nPoints):
sum_x=0.
sum_y=0.
for i in range(0,b):
# computing the distance between body #b and all previous
dx = px[b] - px[i]
dy = py[b] - py[i]
d2 = (dx*dx+dy*dy)
#Much less costly than d2 = d2**(-1.5)
d2 = 1./(np.sqrt(d2)*d2)
# computing acceleration undergone by b from all previous
sum_x += M[i]*G * dx * d2
sum_y += M[i]*G * dy * d2
# adding for each previous, acceleration undergone by from b
ax[i] += M[b]*G * dx * d2
ay[i] += M[b]*G * dy * d2
ax[b]=(-1)*sum_x
ay[b]=(-1)*sum_y
return ax,ay
Timings
N=10
%timeit res=grav(px, py, M)
212 µs ± 3.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit res=grav_2(px, py, M)
1.29 µs ± 7.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
N=100
%timeit res=grav(px, py, M)
2.86 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit res=grav_2(px, py, M)
18.9 µs ± 37.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
N=1000
%timeit res=grav(px, py, M)
86.5 ms ± 448 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit res=grav_2(px, py, M)
1.79 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
N=10_000
%timeit res=grav(px, py, M)
6.28 s ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res=grav_2(px, py, M)
180 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
N=50_000
#Take a more advandced algorithm
#Small particles doesn't significantly interact with other particles
#which are far away (KD-tree based approaches?)
回答2:
I guess that you cannot get a better NumPythonic expression that grav_vectorized()
, which, as already noticed, increases the computational complexity and memory by slightly less than a factor of 2.
However, if you are after speed and you still want to stay in the Python lap, for this specific application, it seems like that the evil is in the details and in the input size. Specifically, the timing, at least on my machine, seems to be dominated, for each iteration, by the fractional power, and saving its result in a temporary variable speed things up.
At the end of the day, if you use a Numba JIT version of your original code with the redundancy optimization (grav_optim_jit()
) you will be optimal almost always.
For very small inputs, the Cython version (grav_loop_cython()
) is the fastest, but as soon as the size grows a little (N > ~100
), the better optimized Numba code (grav_orig_jit()
) takes over the podium. For even larger inputs, the NumPy-optimized but still n (n + 1) / 2
(vectorized in the space dimension) (grav_iter()
) becomes the runner-up, despite being the slowest in the small-N
regime.
Note that the fully vectorized version (grav_vectorized()
), performs quite well for small N
but falls short as soon as the input size increases.
Note also that grav_iter_jit()
is essentially not affected (perhaps slightly slowed down) by Numba JIT, as the core optimizations are on the NumPy side.
It is possible that the Cython version could get faster by compiling with -O3 -march=native
options, but I have not tried that.
Following is the details of the above.
This is the code I tested:
import numpy as np
import numba as nb
G = 6.67408e-2
MAX_SIZE = 1000
MAX_MASS = 1000
a slightly more polished version of your original code
def grav_orig(x_arr, m_arr, G=G):
n_dim, n_points = x_arr.shape
a_arr = np.zeros_like(x_arr, dtype=np.float64)
for i in range(1, n_points):
dx_arr = x_arr[0, i] - x_arr[0, :i]
dy_arr = x_arr[1, i] - x_arr[1, :i]
d2_arr = dx_arr ** 2 + dy_arr ** 2
a_arr[0, i] = -np.sum(m_arr[:i] * dx_arr * G * d2_arr**(-1.5))
a_arr[1, i] = -np.sum(m_arr[:i] * dy_arr * G * d2_arr**(-1.5))
a_arr[0, :i] += m_arr[i] * dx_arr * G * d2_arr**(-1.5)
a_arr[1, :i] += m_arr[i] * dy_arr * G * d2_arr**(-1.5)
return a_arr
the optimized version
def grav_optim(x_arr, m_arr, G=G):
n_dim, n_points = x_arr.shape
a_arr = np.zeros_like(x_arr, dtype=np.float64)
for i in range(1, n_points):
dx_arr = x_arr[0, i] - x_arr[0, :i]
dy_arr = x_arr[1, i] - x_arr[1, :i]
d2_arr = dx_arr ** 2 + dy_arr ** 2
temp = G * d2_arr**(-1.5)
temp_x = dx_arr * temp
temp_y = dy_arr * temp
a_arr[0, i] = -np.sum(m_arr[:i] * temp_x)
a_arr[1, i] = -np.sum(m_arr[:i] * temp_y)
a_arr[0, :i] += m_arr[i] * temp_x
a_arr[1, :i] += m_arr[i] * temp_y
return a_arr
the corresponding Numba JITted version:
grav_optim_jit = nb.jit(grav_optim, nopython=True, nogil=True)
a similar approach to the original but vectorizing along the spatial dimensions:
def grav_iter(x_arr, m_arr, G=G):
n_dim, n_points = x_arr.shape
a_arr = np.zeros_like(x_arr, dtype=np.float64)
for i in range(1, n_points):
d_arr = x_arr[:, i:i + 1] - x_arr[:, :i]
d2_arr = np.sum(d_arr ** 2, axis=0)
temp = G * d_arr * d2_arr[None, :]**(-1.5)
a_arr[:, i] = -np.sum(m_arr[None, :i] * temp, axis=-1)
a_arr[:, :i] += m_arr[None, i] * temp
return a_arr
the corresponding Numba JITted version:
grav_iter_jit = nb.jit(grav_iter)
the fully vectorized version:
def grav_vectorized(x_arr, m_arr, G=G):
d_arr = x_arr[:, :, None] - x_arr[:, None, :]
d2_arr = np.sum(d_arr ** 2, axis=0)
d2_arr[d2_arr == 0] = 1
return np.sum((m_arr[None, :, None] * G * d_arr * d2_arr[None, ...]**(-1.5)), axis=1)
the Cythonized version
%%cython -a
#cython: boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True
import numpy as np
cimport cython
cimport numpy as np
DTYPE = np.double
ctypedef np.double_t DTYPE_t
cdef DTYPE_t G = 6.67408e-2
def grav_optim_cython(x_arr, m_arr, G=G):
n_dim, n_points = x_arr.shape
a_arr = np.zeros_like(x_arr, dtype=np.float64)
for i in range(1, n_points):
dx_arr = x_arr[0, i] - x_arr[0, :i]
dy_arr = x_arr[1, i] - x_arr[1, :i]
d2_arr = dx_arr ** 2 + dy_arr ** 2
temp = G * d2_arr**(-1.5)
temp_x = dx_arr * temp
temp_y = dy_arr * temp
a_arr[0, i] = -np.sum(m_arr[:i] * temp_x)
a_arr[1, i] = -np.sum(m_arr[:i] * temp_y)
a_arr[0, :i] += m_arr[i] * temp_x
a_arr[1, :i] += m_arr[i] * temp_y
return a_arr
the Cythonized loop-explicit version
%load_ext Cython
%%cython -a
#cython: boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True
import numpy as np
cimport cython
cimport numpy as np
DTYPE = np.double
ctypedef np.double_t DTYPE_t
cdef DTYPE_t G = 6.67408e-2
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False) # turn off negative index wrapping for entire function
def grav_loop_cython(
np.ndarray[DTYPE_t, ndim=2] x_arr,
np.ndarray[DTYPE_t, ndim=1] m_arr,
DTYPE_t G=G):
cdef int ndim = x_arr.shape[0]
cdef int n_points = x_arr.shape[1]
cdef np.ndarray[DTYPE_t, ndim=2] a_arr = np.zeros((ndim, n_points), dtype=DTYPE)
cdef np.ndarray[DTYPE_t, ndim=2] dx_arr = np.zeros((ndim, n_points - 1), dtype=DTYPE)
cdef np.ndarray[DTYPE_t, ndim=1] d2_arr = np.zeros((n_points - 1), dtype=DTYPE)
cdef DTYPE_t temp
for j in range(1, n_points):
# compute the pair-wise differences
for jj in range(j):
for i in range(ndim):
dx_arr[i, jj] = x_arr[i, j] - x_arr[i, jj]
# compute the pair-wise squared Euclidean distances
for jj in range(j):
d2_arr[jj] = 0
for i in range(ndim):
d2_arr[jj] += dx_arr[i, jj] ** 2.0
for i in range(ndim):
for jj in range(j):
temp = G * dx_arr[i, jj] * d2_arr[jj] ** -1.5
a_arr[i, j] -= (m_arr[jj] * temp)
a_arr[i, jj] += (m_arr[j] * temp)
return a_arr
Timings
I timed all this with the following code:
funcs = (
grav_orig,
grav_optim,
grav_optim_jit,
grav_optim_cython,
grav_iter,
grav_iter_jit,
grav_loop_cython,
grav_vectorized,
)
Ns = np.geomspace(1e1, 6e3, 16).astype(int)
timings = np.zeros((len(funcs), len(Ns)))
np.random.seed(0)
for i, N in enumerate(Ns):
print('N: ', N)
x_arr = np.random.random((2, N)) * MAX_SIZE
m_arr = np.random.random((N,)) * MAX_MASS
for j, func in enumerate(funcs):
test_result = np.all(np.isclose(grav_orig(x_arr, m_arr), func(x_arr, m_arr)))
func_name = func.__name__ + ('_jit' if '__numba__' in dir(func) else '')
print(f'{func_name:20s} {test_result} ', end='')
t = %timeit -o func(x_arr, m_arr)
timings[j, i] = t.best
print()
In numbers:
# N: 10
# grav_orig True 501 µs ± 37.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim True 358 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_jit True 17.4 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# grav_optim_cython True 383 µs ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter True 371 µs ± 40.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter_jit True 481 µs ± 42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_loop_cython True 12.6 µs ± 250 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# grav_vectorized True 41.3 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# N: 15
# grav_orig True 769 µs ± 81.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim True 540 µs ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_jit True 29.2 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# grav_optim_cython True 547 µs ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter True 602 µs ± 42.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter_jit True 750 µs ± 23.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_loop_cython True 22.9 µs ± 738 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# grav_vectorized True 58.1 µs ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# N: 23
# grav_orig True 1.11 ms ± 55.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim True 788 µs ± 9.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_jit True 53 µs ± 290 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# grav_optim_cython True 825 µs ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter True 875 µs ± 78.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter_jit True 1.05 ms ± 70.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_loop_cython True 49.2 µs ± 1.76 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# grav_vectorized True 89.6 µs ± 4.89 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# N: 35
# grav_orig True 1.87 ms ± 94 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim True 1.35 ms ± 95.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_jit True 111 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# grav_optim_cython True 1.36 ms ± 96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter True 1.31 ms ± 83.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_iter_jit True 1.54 ms ± 49.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_loop_cython True 109 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# grav_vectorized True 159 µs ± 3.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# N: 55
# grav_orig True 3.13 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim True 2.05 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_jit True 237 µs ± 7.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_cython True 2.07 ms ± 54.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter True 2.37 ms ± 36.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter_jit True 2.87 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_loop_cython True 263 µs ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_vectorized True 326 µs ± 7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# N: 84
# grav_orig True 4.97 ms ± 72.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim True 3.5 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_jit True 484 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_cython True 3.26 ms ± 76.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter True 3.87 ms ± 223 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter_jit True 4.81 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_loop_cython True 645 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_vectorized True 805 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# N: 129
# grav_orig True 9.22 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim True 6.14 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_jit True 1.07 ms ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_optim_cython True 5.93 ms ± 411 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter True 6.85 ms ± 651 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter_jit True 7.68 ms ± 523 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_loop_cython True 1.55 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# grav_vectorized True 1.8 ms ± 39.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# N: 197
# grav_orig True 17.4 ms ± 374 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim True 9.57 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_jit True 2.32 ms ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_cython True 9.95 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter True 9.87 ms ± 660 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter_jit True 12.3 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_loop_cython True 3.68 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_vectorized True 4.03 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# N: 303
# grav_orig True 31.9 ms ± 532 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim True 15.9 ms ± 56.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_jit True 5.36 ms ± 21.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_cython True 16.5 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter True 17.1 ms ± 834 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_iter_jit True 18 ms ± 247 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_loop_cython True 8.38 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_vectorized True 10.6 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# N: 464
# grav_orig True 70.7 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim True 31.7 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim_jit True 12.2 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_optim_cython True 29.3 ms ± 646 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_iter True 28.3 ms ± 316 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_iter_jit True 32.5 ms ± 737 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_loop_cython True 19.7 ms ± 52.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# grav_vectorized True 27.8 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# N: 711
# grav_orig True 126 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim True 52.7 ms ± 452 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim_jit True 27.1 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim_cython True 54.1 ms ± 623 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_iter True 54.8 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_iter_jit True 60.6 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_loop_cython True 46.8 ms ± 755 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_vectorized True 67.2 ms ± 3.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# N: 1089
# grav_orig True 306 ms ± 31.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim True 108 ms ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim_jit True 61.5 ms ± 606 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim_cython True 103 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_iter True 110 ms ± 4.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_iter_jit True 114 ms ± 5.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_loop_cython True 107 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_vectorized True 152 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# N: 1669
# grav_orig True 567 ms ± 6.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim True 201 ms ± 4.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_jit True 141 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# grav_optim_cython True 207 ms ± 5.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter True 210 ms ± 3.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter_jit True 223 ms ± 8.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_loop_cython True 252 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_vectorized True 365 ms ± 4.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# N: 2557
# grav_orig True 1.28 s ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim True 418 ms ± 8.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_jit True 339 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_cython True 432 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter True 452 ms ± 12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter_jit True 470 ms ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_loop_cython True 605 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_vectorized True 817 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# N: 3916
# grav_orig True 2.83 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim True 900 ms ± 25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_jit True 778 ms ± 4.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_cython True 894 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter True 951 ms ± 25.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter_jit True 991 ms ± 28.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_loop_cython True 1.41 s ± 30.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_vectorized True 1.88 s ± 22.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# N: 6000
# grav_orig True 6.77 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim True 1.95 s ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_jit True 1.84 s ± 4.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_optim_cython True 2.01 s ± 47.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter True 2.28 s ± 79.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_iter_jit True 2.27 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_loop_cython True 3.26 s ± 43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# grav_vectorized True 4.32 s ± 9.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Here is the code for generating the plot:
import matplotlib as mpl
import matplotlib.pyplot as plt
subplot_shape = (1, 2)
fig, axs = plt.subplots(
*subplot_shape, squeeze=False,
figsize=(8 * subplot_shape[1], 6 * subplot_shape[0]))
ax = axs[0, 0]
ax.set_title('Small-N Regime')
ax.set_xlabel('N / #')
ax.set_ylabel('timing / ms')
small_Ns = tuple(N for N in Ns if N < 1000)
for j, func in enumerate(funcs):
func_name = func.__name__ + ('_jit' if '__numba__' in dir(func) else '')
ax.plot(small_Ns, timings[j, :len(small_Ns)] * 1e3, label=func_name)
ax.legend()
ax = axs[0, 1]
ax.set_title('Full N Range')
ax.set_xlabel('N / #')
ax.set_ylabel('timing / ms')
for j, func in enumerate(funcs):
func_name = func.__name__ + ('_jit' if '__numba__' in dir(func) else '')
ax.plot(Ns, timings[j] * 1e3, label=func_name)
ax.legend()
plt.show()
EDIT (Julia)
Just for fun, I implemented the same code in Julia (although I am no expert here), but timings were quite delusive for all the bragging they do about speed.
using Random
G = 6.67408e-2
MAX_SIZE = 1000
MAX_MASS = 1000
function grav(
x_arr::Array{Float64,2},
m_arr::Array{Float64,2},
g::Float64=G)::Array{Float64,2}
n_dim, n_points = size(x_arr)
a_arr = zeros(size(x_arr))
for i in 2:n_points
d_arr = x_arr[:, i:i] .- x_arr[:, 1:i - 1]
d2_arr = sum(d_arr .^ 2, dims=1)
temp = G .* d_arr .* (d2_arr .^ -1.5)
a_arr[:, i] = -sum(m_arr[:, 1:i - 1] .* temp, dims=2)
a_arr[:, 1:i - 1] += m_arr[:, i - 1] .* temp
end
return a_arr
end
N = 10
x_arr = rand(Float64, 2, N) * MAX_SIZE
m_arr = rand(Float64, 1, N) * MAX_MASS
@time grav(x_arr, m_arr)
# 0.000111 seconds (329 allocations: 23.375 KiB)
N = 6000
x_arr = rand(Float64, 2, N) * MAX_SIZE
m_arr = rand(Float64, 1, N) * MAX_MASS
@time grav(x_arr, m_arr)
# 4.112578 seconds (269.17 k allocations: 2.426 GiB, 1.93% gc time)
# BenchmarkTools.Trial:
# memory estimate: 2.43 GiB
# allocs estimate: 269167
# --------------
# minimum time: 4.096 s (3.31% GC)
# median time: 4.169 s (2.50% GC)
# mean time: 4.169 s (2.50% GC)
# maximum time: 4.243 s (1.72% GC)
# --------------
# samples: 2
# evals/sample: 1
回答3:
With the information I got thanks to @norok2, I got able to get a much faster solution without the loop, and to partially (i.e. only for n=2) reply the question, but not both at the same time. The solution which replies to the question is about 10 times slower:
import numpy as np
def grav_fast(p, M):
G = 6.67408*10**-2 # m³/s²T
d = p[:, :, None] - p[:, None, :]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
return (M[None, :, None]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=1)
#or return (M[None, None, :]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=2)
# (both are equivalent because d is symetric)
def grav_reply(p, M):
G = 6.67408*10**-2 # m³/s²T
d = np.tril(p[:, :, None] - p[:, None, :], -1)
d2 = np.tril((d*d).sum(axis=0), -1)
d2[d2==0] = 1
return (M[None, :, None]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=1) - \
(M[None, None, :]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=2)
# input data
system_p = np.array([[ 1., 2., 3., 4., 5., 6., 7., 9., 4., 0.],
[ 3., 5., 1., 2., 4., 5., 6., 3., 5., 8.]])
system_M = np.array([3., 5., 1., 2., 4., 5., 6., 3., 5., 8.])
# output array
l = len(system_p[0])
system_a = np.zeros(shape=(2, l))
for test in 'grav_fast', 'grav_reply':
print('\ntesting '+test)
system_a = eval(test+'(system_p, system_M)')
for i in range(l):
print('body {} mass = {}(ton), position = {}(m), '
'acceleration = [{:8.4f} {:8.4f}](m/s²)'.format(i,
system_M[i], system_p[:, i], system_a[0, i], system_a[1, i]))
grav_fast
doesn't really answer the question because it makes twice the calculations, and make them also for a body attracted by itself (which causes a division by zero), but for a small system, it's still much faster than with the python's loop (break even is around 600 bodies). On the other side, grav_reply
might be efficient if np.tril
was designed to avoid making the calculations not needed, but it doesn't seem to be the case: A specific test with ipython
showed that changing the limit diagonal in np.tril
(or np.triu
) didn't notably change the execution time.
In [1]: import numpy as np
In [2]: import random
In [3]: a = np.array([[random.randint(10, 99)
....: for _ in range(5)]
....: for _ in range(5)])
In [4]: %timeit np.dot(a, a)
1000000 loops, best of 3: 1.35 µs per loop
In [5]: %timeit np.tril(np.dot(a, a), 0)
100000 loops, best of 3: 17.3 µs per loop
In [6]: %timeit np.tril(np.dot(a, a), -2)
100000 loops, best of 3: 16.5 µs per loop
In [7]: a = np.array([[random.randint(10, 99)
....: for _ in range(100)]
....: for _ in range(100)])
In [8]: %timeit np.tril(a*a, 0)
10000 loops, best of 3: 56.3 µs per loop
In [9]: %timeit np.tril(a*a, -20)
10000 loops, best of 3: 61 µs per loop
In [10]: %timeit np.tril(a*a, 20)
10000 loops, best of 3: 54.7 µs per loop
In [11]: %timeit np.tril(a*a, 60)
10000 loops, best of 3: 54.5 µs per loop
Edit : Here is a performance/size graph for each algorithm
Edit : Here is the last benchmarking code I wrote:
import numpy as np
import time
import random
from matplotlib import pyplot as plt
from grav_c import grav_c, grav2_c
from numba import jit, njit
import datetime
G = 6.67408*10**-8 # m³/s²T
def grav2(p, M):
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
# computing Newton formula : acceleration undergone by b from all previous
a[:, b] = -(M[:b] * G * d2**(-1.5) * d).sum(axis=1)
# computing Newton formula : adding for each previous, acceleration undergone by from b
a[:, :b] += M[b] * G * d2**(-1.5) * d
return a
grav2_jit = jit(grav2)
def grav(p, M):
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
# computing Newton formula : acceleration undergone by b from all previous
a[:, b] = -(M[:b] * G / np.sqrt(d2) / d2 * d).sum(axis=1)
## a[:, b] = -(M[:b] * G * d2**(-1.5) * d).sum(axis=1)
# computing Newton formula : adding for each previous, acceleration undergone by from b
a[:, :b] += M[b] * G / np.sqrt(d2) / d2 * d
## a[:, :b] += M[b] * G * d2**(-1.5) * d
return a
grav_jit = jit(grav)
def grav2_optim1(p, M):
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
VVV = G * d2**(-1.5)
# computing Newton formula : acceleration undergone by b from all previous
a[:, b] = -(M[:b] * VVV * d).sum(axis=1)
# computing Newton formula : adding for each previous, acceleration undergone by from b
a[:, :b] += M[b] * VVV * d
return a
grav2_optim1_jit = jit(grav2_optim1)
def grav_optim1(p, M):
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
VVV = G / np.sqrt(d2) / d2
# computing Newton formula : acceleration undergone by b from all previous
a[:, b] = -(M[:b] * VVV * d).sum(axis=1)
# computing Newton formula : adding for each previous, acceleration undergone by from b
a[:, :b] += M[b] * VVV * d
return a
grav_optim1_jit = jit(grav_optim1)
def grav2_optim2(p, M):
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
XXX = G * d * d2**(-1.5)
# computing Newton formula : acceleration undergone by b from all previous
a[:, b] = -(M[None, :b] * XXX).sum(axis=1)
# computing Newton formula : adding for each previous, acceleration undergone by from b
a[:, :b] += M[b] * XXX
return a
grav2_optim2_jit = jit(grav2_optim2)
def grav_optim2(p, M):
l = len(p[0])
a = np.empty(shape=(2, l))
a[:, 0] = 0
for b in range(1, l):
# computing the distance between body #b and all previous
d = p[:, b:b+1] - p[:, :b]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
XXX = G * d / np.sqrt(d2) / d2
# computing Newton formula : acceleration undergone by b from all previous
a[:, b] = -(M[None, :b] * XXX).sum(axis=1)
# computing Newton formula : adding for each previous, acceleration undergone by from b
a[:, :b] += M[b] * XXX
return a
grav_optim2_jit = jit(grav_optim2)
def grav2_vect(p, M):
d = p[:, :, None] - p[:, None, :]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
return (M[None, :, None]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=1)
grav2_vect_jit = jit(grav2_vect)
def grav_vect(p, M):
d = p[:, :, None] - p[:, None, :]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
return (M[None, :, None]*G*d/(np.sqrt(d2)*d2)[None, :, :]).sum(axis=1)
grav_vect_jit = jit(grav_vect)
# the grav*_vect_bis functions are equivalent to the grav*_vect functions because d is symetric
def grav2_vect_bis(p, M):
d = p[:, :, None] - p[:, None, :]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
return (-M[None, None, :]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=2)
grav2_vect_bis_jit = jit(grav2_vect_bis)
def grav_vect_bis(p, M):
d = p[:, :, None] - p[:, None, :]
d2 = (d*d).sum(axis=0)
d2[d2==0] = 1
return (-M[None, None, :]*G*d/(np.sqrt(d2)*d2)[None, :, :]).sum(axis=2)
grav_vect_bis_jit = jit(grav_vect_bis)
def grav2_tril(p, M):
d = np.tril(p[:, :, None] - p[:, None, :], -1)
d2 = np.tril((d*d).sum(axis=0), -1)
d2[d2==0] = 1
return (M[None, :, None]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=1) - \
(M[None, None, :]*G*d*(d2**(-1.5))[None, :, :]).sum(axis=2)
grav2_tril_jit = jit(grav2_tril)
def grav_tril(p, M):
d = np.tril(p[:, :, None] - p[:, None, :], -1)
d2 = np.tril((d*d).sum(axis=0), -1)
d2[d2==0] = 1
return (M[None, :, None]*G*d/(np.sqrt(d2)*d2)[None, :, :]).sum(axis=1) - \
(M[None, None, :]*G*d/(np.sqrt(d2)*d2)[None, :, :]).sum(axis=2)
grav_tril_jit = jit(grav_tril)
testslist = [
('grav_vect', 'c'), ('grav2_vect', 'c--'), ('grav_vect_jit', 'c:'), ('grav2_vect_jit', 'c-.'),
('grav_vect_bis', 'm'), ('grav2_vect_bis', 'm--'), ('grav_vect_bis_jit', 'm:'), ('grav2_vect_bis_jit', 'm-.'),
('grav_tril', 'y'), ('grav2_tril', 'y--'), ('grav_tril_jit', 'y:'), ('grav2_tril_jit', 'y-.'),
('grav', 'r'), ('grav2', 'r--'), ('grav_jit', 'r:'), ('grav2_jit', 'r-.'),
('grav_optim1', 'g'), ('grav2_optim1', 'g--'), ('grav_optim1_jit', 'g:'), ('grav2_optim1_jit', 'g-.'),
('grav_optim2', 'b'), ('grav2_optim2', 'b--'), ('grav_optim2_jit', 'b:'), ('grav2_optim2_jit', 'b-.'),
('grav_c', 'k'),('grav2_c', 'k--')]
class ScaleType() : pass
class LinScale(ScaleType) : pass
class LogScale(ScaleType) : pass
attempts = 8
scaletype = LogScale
scalelen = 200
scalestart = 2
scalestop = 400
# input data (Multiple datasets to repeat the tests on different data)
randlist = lambda x : [float(random.randint(10000, 99999)) for _ in range(x)]
try:
# data_file = "Here you can give an npz file name to load some presaved data.npz"
with np.load(data_file) as data:
testslist = data['testslist']
N = data['N']
timings = data['timings']
perform = data['perform']
miny = data['miny']
except NameError:
L = scalestop-scalestart
if scalelen > L:
N = np.arange(scalestart, scalestop+1, 1)
elif scaletype == LinScale:
Q = L//(scalelen-1)
R = L%(scalelen-1)
N = np.array([i for r in (range(scalestart, scalestart+Q*(scalelen-1-R), Q),
range(scalestart+Q*(scalelen-1-R), scalestop+1, Q+1)) for i in r])
elif scaletype == LogScale:
X = scalestart
G = scalestop/scalestart
I = scalelen-1
while True:
NX = I*np.log(I/np.log(G)/scalestart)/np.log(G)
if NX-X < 0.0001: break
X = NX
L0 = int(scalestart*np.power(G, X/I))
G = scalestop/(scalestart+L0)
I = scalelen-1-L0
a1 = np.array(range(I))
N = np.concatenate((range(scalestart, scalestart+L0, 1),
scalestart+L0-1+np.cumsum((0.+(scalestart+L0)*(np.exp(np.log(G)*(a1+1)/I) - np.exp(np.log(G)*a1/I))).astype(int)),
[scalestop]))
print(N)
l = len(N)
timings = np.full(l, 9999999., dtype=[(test[0], np.float64) for test in testslist])
perform = np.full(l, 9999999., dtype=[(test[0], np.float64) for test in testslist])
miny = 9999999.
accum = 0 # This is to prevent system to perform unwanted optimisations
for j in range(attempts):
for i in range(l):
L = N[i]
system_p = [np.array([randlist(L), randlist(L)]) for _ in range(100)]
system_M = [np.array( randlist(L)) for _ in range(100)]
for test in testslist:
timeref = -time.time()
system_a = eval(test[0]+'(system_p[0], system_M[0])')
accum += system_a[0, 0]
count = 1
while time.time()+timeref<0.001:
for count in range(count+1, 10*count+1):
system_a = eval(test[0]+'(system_p[count%100], system_M[count%100])')
timeref += time.time()
## print(count)
timings[test[0]][i] = min(timings[test[0]][i], timeref/count)
val = timings[test[0]][i]/(N[i]*(N[i]-1)/2)
perform[test[0]][i] = val
miny = min(val, miny)
if i%10==9: print(j, end='', flush=True)
print(flush=True)
filename = "example grav, stackoverflow "+str(datetime.datetime.now())+".npz"
print("saving data to", filename)
np.savez(filename, testslist=testslist, N=N, timings=timings, perform=perform, miny=miny)
ymin = 10**(np.floor(np.log10(miny)))
if (5*ymin<=miny): ymin *= 5
elif (2*ymin<=miny): ymin *= 2
print('ymin = {}, miny = {}\n'.format(ymin, miny))
figa, ax = plt.subplots(figsize=(24, 12))
for test in testslist:
ax.plot(N, timings[test[0]], test[1], label=test[0])
ax.set_title('numpy compared timings')
plt.xlabel('N (system size)')
plt.ylabel('timings (msec)')
plt.grid(True)
plt.legend(loc='upper left', bbox_to_anchor=(0., 1), shadow=True, ncol=7)
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.98, top=0.98)
figb, bx = plt.subplots(figsize=(24, 12))
for test in testslist:
bx.plot(N, timings[test[0]], test[1], label=test[0])
bx.set_title('numpy compared timings')
plt.yscale('log')
plt.xscale('log')
plt.xlabel('N (system size)')
plt.ylabel('timings (msec)')
plt.grid(True)
plt.legend(loc='upper left', bbox_to_anchor=(0., 1), shadow=True, ncol=7)
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.98, top=0.98)
figc, cx = plt.subplots(figsize=(24, 12))
for test in testslist:
cx.plot(N, perform[test[0]], test[1], label=test[0])
plt.ylim(0, 20*ymin)
cx.set_title('numpy compared performance')
plt.xlabel('N (system size)')
plt.ylabel('performance (msec)/N²')
plt.grid(True)
plt.legend(loc='upper right', bbox_to_anchor=(1., 1), shadow=True, ncol=7)
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.98, top=0.98)
figd, dx = plt.subplots(figsize=(24, 12))
for test in testslist:
dx.plot(N, perform[test[0]], test[1], label=test[0])
dx.set_title('numpy compared performance')
plt.yscale('log')
plt.xscale('log')
plt.xlabel('N (system size)')
plt.ylabel('performance (msec)/N²')
plt.grid(True)
plt.legend(loc='upper right', bbox_to_anchor=(1., 1), shadow=True, ncol=7)
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.98, top=0.98)
plt.show()
With it's C module
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <numpy/arrayobject.h>
#define G 6.67408E-8L
void * failure(PyObject *type, const char *message) {
PyErr_SetString(type, message);
return NULL;
}
void * success(PyObject *var){
Py_INCREF(var);
return var;
}
static PyObject *
Py_grav_c(PyObject *self, PyObject *args)
{
PyArrayObject *p, *M;
PyObject *a;
int i, j, k;
double *pq0, *pq1, *Mq0, *Mq1, *aq0, *aq1, *p0, *p1, *a0, *a1;
if (!PyArg_ParseTuple(args, "O!O!", &PyArray_Type, &p, &PyArray_Type, &M))
return failure(PyExc_RuntimeError, "Failed to parse parameters.");
if (PyArray_DESCR(p)->type_num != NPY_DOUBLE)
return failure(PyExc_TypeError, "Type np.float64 expected for p array.");
if (PyArray_DESCR(M)->type_num != NPY_DOUBLE)
return failure(PyExc_TypeError, "Type np.float64 expected for M array.");
if (PyArray_NDIM(p)!=2)
return failure(PyExc_TypeError, "p must be a 2 dimensionnal array.");
if (PyArray_NDIM(M)!=1)
return failure(PyExc_TypeError, "M must be a 1 dimensionnal array.");
int K = PyArray_DIM(p, 0); // Number of dimensions you want
int L = PyArray_DIM(p, 1); // Number of bodies in the system
int S0 = PyArray_STRIDE(p, 0); // Normally, the arrays should be contiguous
int S1 = PyArray_STRIDE(p, 1); // But since they provide this Stride info
int SM = PyArray_STRIDE(M, 0); // I supposed they might not be (alignment)
if (PyArray_DIM(M, 0) != L)
return failure(PyExc_TypeError,
"P and M must have the same number of bodies.");
a = PyArray_NewLikeArray(p, NPY_ANYORDER, NULL, 0);
if (a == NULL)
return failure(PyExc_RuntimeError, "Failed to create output array.");
PyArray_FILLWBYTE(a, 0);
// For all bodies except first which has no previous body
for (i = 1,
pq0 = (double *)(PyArray_DATA(p)+S1),
Mq0 = (double *)(PyArray_DATA(M)+SM),
aq0 = (double *)(PyArray_DATA(a)+S1);
i < L;
i++,
*(void **)&pq0 += S1,
*(void **)&Mq0 += SM,
*(void **)&aq0 += S1
) {
// For all previous bodies
for (j = 0,
pq1 = (double *)PyArray_DATA(p),
Mq1 = (double *)PyArray_DATA(M),
aq1 = (double *)PyArray_DATA(a);
j < i;
j++,
*(void **)&pq1 += S1,
*(void **)&Mq1 += SM,
*(void **)&aq1 += S1
) {
// For all dimensions calculate deltas
long double d[K], d2 = 0, VVV, M0xVVV, M1xVVV;
for (k = 0,
p0 = pq0,
p1 = pq1;
k<K;
k++,
*(void **)&p0 += S0,
*(void **)&p1 += S0) {
d[k] = *p1 - *p0;
}
// calculate Hypotenuse squared
for (k = 0, d2 = 0; k<K; k++) {
d2 += d[k]*d[k];
}
// calculate interm. results once for each bodies pair (optimization)
VVV = G;
#define LIM 1
// if (d2<LIM) d2=LIM; // Variation on collision case
if (d2>0) VVV /= d2*sqrt(d2);
M0xVVV = *Mq0 * VVV; // anonymous intermediate result
M1xVVV = *Mq1 * VVV; // anonymous intermediate result
// For all dimensions calculate component of acceleration
for (k = 0,
a0 = aq0,
a1 = aq1;
k<K;
k++,
*(void **)&a0 += S0,
*(void **)&a1 += S0) {
*a0 += M1xVVV*d[k];
*a1 -= M0xVVV*d[k];
}
}
}
/* clean up and return the result */
return success(a);
}
static PyObject *
Py_grav2_c(PyObject *self, PyObject *args)
{
PyArrayObject *p, *M;
PyObject *a;
int i, j, k;
double *pq0, *pq1, *Mq0, *Mq1, *aq0, *aq1, *p0, *p1, *a0, *a1;
if (!PyArg_ParseTuple(args, "O!O!", &PyArray_Type, &p, &PyArray_Type, &M))
return failure(PyExc_RuntimeError, "Failed to parse parameters.");
if (PyArray_DESCR(p)->type_num != NPY_DOUBLE)
return failure(PyExc_TypeError, "Type np.float64 expected for p array.");
if (PyArray_DESCR(M)->type_num != NPY_DOUBLE)
return failure(PyExc_TypeError, "Type np.float64 expected for M array.");
if (PyArray_NDIM(p)!=2)
return failure(PyExc_TypeError, "p must be a 2 dimensionnal array.");
if (PyArray_NDIM(M)!=1)
return failure(PyExc_TypeError, "M must be a 1 dimensionnal array.");
int K = PyArray_DIM(p, 0); // Number of dimensions you want
int L = PyArray_DIM(p, 1); // Number of bodies in the system
int S0 = PyArray_STRIDE(p, 0); // Normally, the arrays should be contiguous
int S1 = PyArray_STRIDE(p, 1); // But since they provide this Stride info
int SM = PyArray_STRIDE(M, 0); // I supposed they might not be (alignment)
if (PyArray_DIM(M, 0) != L)
return failure(PyExc_TypeError,
"P and M must have the same number of bodies.");
a = PyArray_NewLikeArray(p, NPY_ANYORDER, NULL, 0);
if (a == NULL)
return failure(PyExc_RuntimeError, "Failed to create output array.");
PyArray_FILLWBYTE(a, 0);
// For all bodies except first which has no previous body
for (i = 1,
pq0 = (double *)(PyArray_DATA(p)+S1),
Mq0 = (double *)(PyArray_DATA(M)+SM),
aq0 = (double *)(PyArray_DATA(a)+S1);
i < L;
i++,
*(void **)&pq0 += S1,
*(void **)&Mq0 += SM,
*(void **)&aq0 += S1
) {
// For all previous bodies
for (j = 0,
pq1 = (double *)PyArray_DATA(p),
Mq1 = (double *)PyArray_DATA(M),
aq1 = (double *)PyArray_DATA(a);
j < i;
j++,
*(void **)&pq1 += S1,
*(void **)&Mq1 += SM,
*(void **)&aq1 += S1
) {
// For all dimensions calculate deltas
long double d[K], d2 = 0, VVV, M0xVVV, M1xVVV;
for (k = 0,
p0 = pq0,
p1 = pq1;
k<K;
k++,
*(void **)&p0 += S0,
*(void **)&p1 += S0) {
d[k] = *p1 - *p0;
}
// calculate Hypotenuse squared
for (k = 0, d2 = 0; k<K; k++) {
d2 += d[k]*d[k];
}
// calculate interm. results once for each bodies pair (optimization)
VVV = G;
#define LIM 1
// if (d2<LIM) d2=LIM; // Variation on collision case
if (d2>0) VVV *= pow(d2, -1.5);
M0xVVV = *Mq0 * VVV; // anonymous intermediate result
M1xVVV = *Mq1 * VVV; // anonymous intermediate result
// For all dimensions calculate component of acceleration
for (k = 0,
a0 = aq0,
a1 = aq1;
k<K;
k++,
*(void **)&a0 += S0,
*(void **)&a1 += S0) {
*a0 += M1xVVV*d[k];
*a1 -= M0xVVV*d[k];
}
}
}
/* clean up and return the result */
return success(a);
}
// exported functions list
static PyMethodDef grav_c_Methods[] = {
{"grav_c", Py_grav_c, METH_VARARGS, "grav_c(p, M)\n"
"\n"
"grav_c takes the positions and masses of m bodies in Newtonian attraction in a n dimensionnal universe,\n"
"and returns the accelerations each body undergoes.\n"
"input data take the for of a row of fload64 for each dimension of the position (in p) and one row for the masses.\n"
"It returns and array of the same shape as p for the accelerations."},
{"grav2_c", Py_grav2_c, METH_VARARGS, "grav_c(p, M)\n"
"\n"
"grav_c takes the positions and masses of m bodies in Newtonian attraction in a n dimensionnal universe,\n"
"and returns the accelerations each body undergoes.\n"
"input data take the for of a row of fload64 for each dimension of the position (in p) and one row for the masses.\n"
"It returns and array of the same shape as p for the accelerations."},
{NULL, NULL, 0, NULL} // pour terminer la liste.
};
static char grav_c_doc[] = "Compute attractions between n bodies.";
static struct PyModuleDef grav_c_module = {
PyModuleDef_HEAD_INIT,
"grav_c", /* name of module */
grav_c_doc, /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
grav_c_Methods
};
PyMODINIT_FUNC
PyInit_grav_c(void)
{
// I don't understand why yet, but the program segfaults without this.
import_array();
return PyModule_Create(&grav_c_module);
}
来源:https://stackoverflow.com/questions/56089471/is-there-a-way-with-numpy-to-compute-something-for-all-combinations-of-n-lines