I have many large (>35,000,000) lists of integers that will contain duplicates. I need to get a count for each integer in a list. The following code works, but seems slow. C
i get a 3x improvement doing something like this:
def group(): import numpy as np values = np.array(np.random.randint(0,3298,size=35000000),dtype='u4') values.sort() dif = np.ones(values.shape,values.dtype) dif[1:] = np.diff(values) idx = np.where(dif>0) vals = values[idx] count = np.diff(idx)