Numpy vs Cython speed

前端 未结 5 1539
清酒与你
清酒与你 2020-11-28 22:37

I have an analysis code that does some heavy numerical operations using numpy. Just for curiosity, tried to compile it with cython with little changes and then I rewrote it

5条回答
  •  难免孤独
    2020-11-28 23:11

    With slight modification, version 3 becomes twice as fast:

    @cython.boundscheck(False)
    @cython.wraparound(False)
    @cython.nonecheck(False)
    def process2(np.ndarray[DTYPE_t, ndim=2] array):
    
        cdef unsigned int rows = array.shape[0]
        cdef unsigned int cols = array.shape[1]
        cdef unsigned int row, col, row2
        cdef np.ndarray[DTYPE_t, ndim=2] out = np.empty((rows, cols))
    
        for row in range(rows):
            for row2 in range(rows):
                for col in range(cols):
                    out[row, col] += array[row2, col] - array[row, col]
    
        return out
    

    The bottleneck in your calculation is memory access. Your input array is C ordered, which means that moving along the last axis makes the smallest jump in memory. Therefore your inner loop should be along axis 1, not axis 0. Making this change cuts the run time in half.

    If you need to use this function on small input arrays then you can reduce the overhead by using np.empty instead of np.ones. To reduce the overhead further use PyArray_EMPTY from the numpy C API.

    If you use this function on very large input arrays (2**31) then the integers used for indexing (and in the range function) will overflow. To be safe use:

    cdef Py_ssize_t rows = array.shape[0]
    cdef Py_ssize_t cols = array.shape[1]
    cdef Py_ssize_t row, col, row2
    

    instead of

    cdef unsigned int rows = array.shape[0]
    cdef unsigned int cols = array.shape[1]
    cdef unsigned int row, col, row2
    

    Timing:

    In [2]: a = np.random.rand(10000, 10)
    In [3]: timeit process(a)
    1 loops, best of 3: 3.53 s per loop
    In [4]: timeit process2(a)
    1 loops, best of 3: 1.84 s per loop
    

    where process is your version 3.

提交回复
热议问题