What is the fastest way to map group names of numpy array to indices?

前端 未结 3 379
不思量自难忘°
不思量自难忘° 2020-12-17 20:48

I\'m working with 3D pointcloud of Lidar. The points are given by numpy array that looks like this:

points = np.array([[61651921, 416326074, 39805], [6160525         


        
3条回答
  •  清酒与你
    2020-12-17 20:59

    You could use Cython:

    %%cython -c-O3 -c-march=native -a
    #cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
    
    import math
    import cython as cy
    
    cimport numpy as cnp
    
    
    cpdef groupby_index_dict_cy(cnp.int32_t[:, :] arr):
        cdef cy.size_t size = len(arr)
        result = {}
        for i in range(size):
            key = arr[i, 0], arr[i, 1], arr[i, 2]
            if key in result:
                result[key].append(i)
            else:
                result[key] = [i]
        return result
    

    but it will not make you faster than what Pandas does, although it is the fastest after that (and perhaps the numpy_index based solution), and does not come with the memory penalty of it. A collection of what has been proposed so far is here.

    In OP's machine that should get close to ~12 sec execution time.

提交回复
热议问题