More efficient way to loop?

这一生的挚爱 提交于 2019-12-05 18:13:05

A caveat before we start. range(0, numy-1), which is equal to range(numy-1), produces the numbers from 0 to numy-2, not including numy-1. That's because you have numy-1 values from 0 to numy-2. While MATLAB has 1-based indexing, Python has 0-based, so be a bit careful with your indexing in the transition. Considering you have tri_area = np.zeros((numx, numy), dtype=float), tri_area[ii,jj] never accesses the last row or column with the way you have set up your loops. Therefore, I suspect the correct intention was to write range(numy).

Since the fuction t_area() is vectorisable, you can do away with the loops completely. Vectorisation means numpy applies some operations on a whole array at the same time by taking care of the loops under the hood, where they will be faster.

First, we stack all the aps for each (i, j) element in a (m, n, 3) array, where (m, n) is the size of x. If we take the cross product of two (m, n, 3) arrays, the operation will be applied on the last axis by default. This means that np.cross(a, b) will do for every element (i, j) take the cross product of the 3 numbers in a[i,j] and b[i,j]. Similarly, np.linalg.norm(a, axis=2) will do for every element (i, j) calculate the norm of the 3 numbers in a[i,j]. This will also effectively reduce our array to size (m, n). A bit of caution here though, as we need to explicitly state we want this operation done on the 2nd axis.

Note that in the following example my indexing relationship may not correspond to yours. The bare minimum to make this work is for surface to have one extra row and column from x and y.

import numpy as np

def _t_area(a, b, c):
    ab = b - a
    ac = c - a
    return 0.5 * np.linalg.norm(np.cross(ab, ac), axis=2)

def t_area(x, y, surface, dx):
    a = np.zeros((x.shape[0], y.shape[0], 3), dtype=float)
    b = np.zeros_like(a)
    c = np.zeros_like(a)
    d = np.zeros_like(a)

    a[...,0] = x
    a[...,1] = y
    a[...,2] = surface[:-1,:-1]

    b[...,0] = x + dx
    b[...,1] = y
    b[...,2] = surface[1:,:-1]

    c[...,0] = x
    c[...,1] = y + dx
    c[...,2] = surface[:-1,1:]

    d[...,0] = bp[...,0]
    d[...,1] = cp[...,1]
    d[...,2] = surface[1:,1:]

    # are you sure you didn't mean 0.25???
    return 0.5 * (_t_area(a, b, c) + _t_area(d, b, c) + _t_area(b, a, d) + _t_area(c, a, d))

nx, ny = 250, 250

dx = np.random.random()
x = np.random.random((nx, ny))
y = np.random.random((nx, ny))
surface = np.random.random((nx+1, ny+1))

tri_area = t_area(x, y, surface, dx)

x in this example supports the indices 0-249, while surface 0-250. surface[:-1], a shorthand for surface[0:-1], will return all rows starting from 0 and up to the last one, but not including it. -1 serves the same function and end in MATLAB. So, surface[:-1] will return the rows for indices 0-249. Similarly, surface[1:] will return the rows for indices 1-250, which achieves the same as your surface[ii+1].


Note: I had written this section before it was known that t_area() could be fully vectorised. So while what is here is obsolete for the purposes of this answer, I'll leave it as legacy to show what optimisations could have been made had the function not be vectorisable.

Instead of calling the function for each element, which is expensive, you should pass it x, y,, surface and dx and iterate internally. That means only one function call and less overhead.

Furthermore, you shouldn't create an array for ap, bp, cp and dp every loop, which again, adds overhead. Allocate them once outside the loop and just update their values.

One final change should be the order of loops. Numpy arrays are row major by default (while MATLAB is column major), so ii performs better as the outer loop. You wouldn't notice the difference for arrays of your size, but hey, why not?

Overall, the modified function should look like this.

def t_area(x, y, surface, dx):
    # I assume numx == x.shape[0]. If not, pass it as an extra argument.
    tri_area = np.zeros(x.shape, dtype=float)

    ap = np.zeros((3,), dtype=float)
    bp = np.zeros_like(ap)
    cp = np.zeros_like(ap)
    dp = np.zeros_like(ap)

    for ii in range(x.shape[0]-1): # do you really want range(numx-1) or just range(numx)?
        for jj in range(x.shape[1]-1):
            xp = x[ii,jj]
            yp = y[ii,jj]
            zp = surface[ii,jj]
            ap[:] = (xp, yp, zp)

            # get `bp`, `cp` and `dp` in a similar manner and compute `tri_area[ii,jj]`
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!