More efficient way to loop?

I have a small piece of code from a much larger script. I figured out that when the function t_area is called, it is responsible for most of the run time. I tested the function by itself, and it is not slow, it takes a lot of time because of the number of times that it has to be ran I believe. Here is the code where the function is called:

tri_area = np.zeros((numx,numy),dtype=float)
for jj in range(0,numy-1):
    for ii in range(0,numx-1):
      xp = x[ii,jj]
      yp = y[ii,jj]
      zp = surface[ii,jj]
      ap = np.array((xp,yp,zp))

      xp = xp+dx
      zp = surface[ii+1,jj]
      bp = np.array((xp,yp,zp))

      yp = yp+dx
      zp = surface[ii+1,jj+1]
      dp = np.array((xp,yp,zp))

      xp = xp-dx
      zp = surface[ii,jj+1]
      cp = np.array((xp,yp,zp))

      tri_area[ii,jj] = t_area(ap,bp,cp,dp)

The size of the arrays in use here are 216 x 217, and so are the values of x and y. I am pretty new to python coding, I have used MATLAB in the past. So my question is, is there a way to get around the two for-loops, or a more efficient way to run through this block of code in general? Looking for any help speeding this up! Thanks!

EDIT:

Thanks for the help everyone, this has cleared alot of confusion up. I was asked about the function t_area that is used in the loop, here is the code below:

def t_area(a,b,c,d):
ab=b-a
ac=c-a
tri_area_a = 0.5*linalg.norm(np.cross(ab,ac))

db=b-d
dc=c-d
tri_area_d = 0.5*linalg.norm(np.cross(db,dc))

ba=a-b
bd=d-b
tri_area_b = 0.5*linalg.norm(np.cross(ba,bd))

ca=a-c
cd=d-c
tri_area_c = 0.5*linalg.norm(np.cross(ca,cd))

av_area = (tri_area_a + tri_area_b + tri_area_c + tri_area_d)*0.5
return(av_area)

Sorry for the confusing notation, at the time it made sense, looking back now I will probably change it. Thanks!

A caveat before we start. range(0, numy-1), which is equal to range(numy-1), produces the numbers from 0 to numy-2, not including numy-1. That's because you have numy-1 values from 0 to numy-2. While MATLAB has 1-based indexing, Python has 0-based, so be a bit careful with your indexing in the transition. Considering you have tri_area = np.zeros((numx, numy), dtype=float), tri_area[ii,jj] never accesses the last row or column with the way you have set up your loops. Therefore, I suspect the correct intention was to write range(numy).

Since the fuction t_area() is vectorisable, you can do away with the loops completely. Vectorisation means numpy applies some operations on a whole array at the same time by taking care of the loops under the hood, where they will be faster.

First, we stack all the aps for each (i, j) element in a (m, n, 3) array, where (m, n) is the size of x. If we take the cross product of two (m, n, 3) arrays, the operation will be applied on the last axis by default. This means that np.cross(a, b) will do for every element (i, j) take the cross product of the 3 numbers in a[i,j] and b[i,j]. Similarly, np.linalg.norm(a, axis=2) will do for every element (i, j) calculate the norm of the 3 numbers in a[i,j]. This will also effectively reduce our array to size (m, n). A bit of caution here though, as we need to explicitly state we want this operation done on the 2nd axis.

Note that in the following example my indexing relationship may not correspond to yours. The bare minimum to make this work is for surface to have one extra row and column from x and y.

import numpy as np

def _t_area(a, b, c):
    ab = b - a
    ac = c - a
    return 0.5 * np.linalg.norm(np.cross(ab, ac), axis=2)

def t_area(x, y, surface, dx):
    a = np.zeros((x.shape[0], y.shape[0], 3), dtype=float)
    b = np.zeros_like(a)
    c = np.zeros_like(a)
    d = np.zeros_like(a)

    a[...,0] = x
    a[...,1] = y
    a[...,2] = surface[:-1,:-1]

    b[...,0] = x + dx
    b[...,1] = y
    b[...,2] = surface[1:,:-1]

    c[...,0] = x
    c[...,1] = y + dx
    c[...,2] = surface[:-1,1:]

    d[...,0] = bp[...,0]
    d[...,1] = cp[...,1]
    d[...,2] = surface[1:,1:]

    # are you sure you didn't mean 0.25???
    return 0.5 * (_t_area(a, b, c) + _t_area(d, b, c) + _t_area(b, a, d) + _t_area(c, a, d))

nx, ny = 250, 250

dx = np.random.random()
x = np.random.random((nx, ny))
y = np.random.random((nx, ny))
surface = np.random.random((nx+1, ny+1))

tri_area = t_area(x, y, surface, dx)

x in this example supports the indices 0-249, while surface 0-250. surface[:-1], a shorthand for surface[0:-1], will return all rows starting from 0 and up to the last one, but not including it. -1 serves the same function and end in MATLAB. So, surface[:-1] will return the rows for indices 0-249. Similarly, surface[1:] will return the rows for indices 1-250, which achieves the same as your surface[ii+1].

Note: I had written this section before it was known that t_area() could be fully vectorised. So while what is here is obsolete for the purposes of this answer, I'll leave it as legacy to show what optimisations could have been made had the function not be vectorisable.

Instead of calling the function for each element, which is expensive, you should pass it x, y,, surface and dx and iterate internally. That means only one function call and less overhead.

Furthermore, you shouldn't create an array for ap, bp, cp and dp every loop, which again, adds overhead. Allocate them once outside the loop and just update their values.

One final change should be the order of loops. Numpy arrays are row major by default (while MATLAB is column major), so ii performs better as the outer loop. You wouldn't notice the difference for arrays of your size, but hey, why not?

Overall, the modified function should look like this.

def t_area(x, y, surface, dx):
    # I assume numx == x.shape[0]. If not, pass it as an extra argument.
    tri_area = np.zeros(x.shape, dtype=float)

    ap = np.zeros((3,), dtype=float)
    bp = np.zeros_like(ap)
    cp = np.zeros_like(ap)
    dp = np.zeros_like(ap)

    for ii in range(x.shape[0]-1): # do you really want range(numx-1) or just range(numx)?
        for jj in range(x.shape[1]-1):
            xp = x[ii,jj]
            yp = y[ii,jj]
            zp = surface[ii,jj]
            ap[:] = (xp, yp, zp)

            # get `bp`, `cp` and `dp` in a similar manner and compute `tri_area[ii,jj]`

来源：https://stackoverflow.com/questions/34701570/more-efficient-way-to-loop

标签

python

performance

for-loop

vectorization

micro-optimization