I\'m working with 3D pointcloud of Lidar. The points are given by numpy array that looks like this:
points = np.array([[61651921, 416326074, 39805], [6160525
You could use Cython:
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
import math
import cython as cy
cimport numpy as cnp
cpdef groupby_index_dict_cy(cnp.int32_t[:, :] arr):
cdef cy.size_t size = len(arr)
result = {}
for i in range(size):
key = arr[i, 0], arr[i, 1], arr[i, 2]
if key in result:
result[key].append(i)
else:
result[key] = [i]
return result
but it will not make you faster than what Pandas does, although it is the fastest after that (and perhaps the numpy_index based solution), and does not come with the memory penalty of it.
A collection of what has been proposed so far is here.
In OP's machine that should get close to ~12 sec execution time.