Return array of counts for each feature of input

前端 未结 1 614
一个人的身影
一个人的身影 2020-12-11 10:59

I have an array of integer labels and I would like to determine how many of each label is present and store those values in an array of the same size as the input. This can

相关标签:
1条回答
  • 2020-12-11 11:17

    Approach #1

    Here's one using np.unique -

    _, tags, count = np.unique(labels, return_counts=1, return_inverse=1)
    sizes = count[tags]
    

    Approach #2

    With positive numbers in labels, simpler and more efficient way with np.bincount -

    sizes = np.bincount(labels)[labels]
    

    Runtime test

    Setup with 60,000 unique positive numbers and two such sets of lengths 100,000 and 1000,000 are timed.

    Set #1 :

    In [192]: np.random.seed(0)
         ...: labels = np.random.randint(0,60000,(100000))
    
    In [193]: %%timeit
         ...: sizes = np.zeros(labels.shape)
         ...: for num in np.unique(labels):
         ...:     mask = labels == num
         ...:     sizes[mask] = np.count_nonzero(mask)
    1 loop, best of 3: 2.32 s per loop
    
    In [194]: %timeit np.bincount(labels)[labels]
    1000 loops, best of 3: 376 µs per loop
    
    In [195]: 2320/0.376 # Speedup figure
    Out[195]: 6170.212765957447
    

    Set #2 :

    In [196]: np.random.seed(0)
         ...: labels = np.random.randint(0,60000,(1000000))
    
    In [197]: %%timeit
         ...: sizes = np.zeros(labels.shape)
         ...: for num in np.unique(labels):
         ...:     mask = labels == num
         ...:     sizes[mask] = np.count_nonzero(mask)
    1 loop, best of 3: 43.6 s per loop
    
    In [198]: %timeit np.bincount(labels)[labels]
    100 loops, best of 3: 5.15 ms per loop
    
    In [199]: 43600/5.15 # Speedup figure
    Out[199]: 8466.019417475727
    
    0 讨论(0)
提交回复
热议问题