Numpy grouping using itertools.groupby performance

前端 未结 10 962
庸人自扰
庸人自扰 2020-12-01 03:17

I have many large (>35,000,000) lists of integers that will contain duplicates. I need to get a count for each integer in a list. The following code works, but seems slow. C

10条回答
  •  温柔的废话
    2020-12-01 03:33

    i get a 3x improvement doing something like this:

    def group():
        import numpy as np
        values = np.array(np.random.randint(0,3298,size=35000000),dtype='u4')
        values.sort()
        dif = np.ones(values.shape,values.dtype)
        dif[1:] = np.diff(values)
        idx = np.where(dif>0)
        vals = values[idx]
        count = np.diff(idx)
    

提交回复
热议问题