Numpy vs Cython speed

前端未结

关注

 5  1539

清酒与你 2020-11-28 22:37

I have an analysis code that does some heavy numerical operations using numpy. Just for curiosity, tried to compile it with cython with little changes and then I rewrote it

5条回答

难免孤独 (楼主)

2020-11-28 23:11

With slight modification, version 3 becomes twice as fast:

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def process2(np.ndarray[DTYPE_t, ndim=2] array):

    cdef unsigned int rows = array.shape[0]
    cdef unsigned int cols = array.shape[1]
    cdef unsigned int row, col, row2
    cdef np.ndarray[DTYPE_t, ndim=2] out = np.empty((rows, cols))

    for row in range(rows):
        for row2 in range(rows):
            for col in range(cols):
                out[row, col] += array[row2, col] - array[row, col]

    return out

The bottleneck in your calculation is memory access. Your input array is C ordered, which means that moving along the last axis makes the smallest jump in memory. Therefore your inner loop should be along axis 1, not axis 0. Making this change cuts the run time in half.

If you need to use this function on small input arrays then you can reduce the overhead by using np.empty instead of np.ones. To reduce the overhead further use PyArray_EMPTY from the numpy C API.

If you use this function on very large input arrays (2**31) then the integers used for indexing (and in the range function) will overflow. To be safe use:

cdef Py_ssize_t rows = array.shape[0]
cdef Py_ssize_t cols = array.shape[1]
cdef Py_ssize_t row, col, row2

instead of

cdef unsigned int rows = array.shape[0]
cdef unsigned int cols = array.shape[1]
cdef unsigned int row, col, row2

Timing:

In [2]: a = np.random.rand(10000, 10)
In [3]: timeit process(a)
1 loops, best of 3: 3.53 s per loop
In [4]: timeit process2(a)
1 loops, best of 3: 1.84 s per loop

where process is your version 3.

0 讨论(0)

查看其它5个回答