How to find most frequent values in numpy ndarray?

前端 未结 5 2152
栀梦
栀梦 2020-12-15 05:38

I have a numpy ndarray with shape of (30,480,640), the 1th and 2th axis representing locations(latitude and longitute), the 0th axis contains actual data points.I want to us

相关标签:
5条回答
  • 2020-12-15 06:14

    Use SciPy's mode function:

    import numpy as np
    from scipy.stats import mode
    
    data = np.array([[[ 0,  1,  2,  3,  4],
                      [ 5,  6,  7,  8,  9],
                      [10, 11, 12, 13, 14],
                      [15, 16, 17, 18, 19]],
    
                     [[ 0,  1,  2,  3,  4],
                      [ 5,  6,  7,  8,  9],
                      [10, 11, 12, 13, 14],
                      [15, 16, 17, 18, 19]],
    
                     [[40, 40, 42, 43, 44],
                      [45, 46, 47, 48, 49],
                      [50, 51, 52, 53, 54],
                      [55, 56, 57, 58, 59]]])
    
    print data
    
    # find mode along the zero-th axis; the return value is a tuple of the
    # modes and their counts.
    print mode(data, axis=0)
    
    0 讨论(0)
  • 2020-12-15 06:15

    Explaining @ecatmurs part

    u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                    None, np.max(indices) + 1), axis=axis)]
    

    a little bit more and restructuring it to be more concise when re-reading it (because I used this solution and after a few weeks I was wondering what had happened in this function):

    axis = 0
    uniques, indices = np.unique(arr, return_inverse=True)
    
    args_for_bincount_fn = None, np.max(indices) + 1
    binned_indices = np.apply_along_axis(np.bincount,
                                last_axis, 
                                indices.reshape(arr.shape),
                                *args_for_bincount_fn)
    
    most_common = uniques[np.argmax(binned_indices,axis=axis)]
    
    0 讨论(0)
  • 2020-12-15 06:19

    To find the most frequent value of a flat array, use unique, bincount and argmax:

    arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
    u, indices = np.unique(arr, return_inverse=True)
    u[np.argmax(np.bincount(indices))]
    

    To work with a multidimensional array, we don't need to worry about unique, but we do need to use apply_along_axis on bincount:

    arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                    [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
    axis = 1
    u, indices = np.unique(arr, return_inverse=True)
    u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                    None, np.max(indices) + 1), axis=axis)]
    

    With your data:

    data = np.array([
       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],
    
       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],
    
       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])
    axis = 0
    u, indices = np.unique(arr, return_inverse=True)
    u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                    None, np.max(indices) + 1), axis=axis)]
    array([[ 0,  1,  2,  3,  4],
           [ 5,  6,  7,  8,  9],
           [10, 11, 12, 13, 14],
           [15, 16, 17, 18, 19]])
    

    NumPy 1.2, really? You can approximate np.unique(return_inverse=True) reasonably efficiently using np.searchsorted (it's an additional O(n log n), so shouldn't change the performance significantly):

    u = np.unique(arr)
    indices = np.searchsorted(u, arr.flat)
    
    0 讨论(0)
  • 2020-12-15 06:24

    flatten your array, then build a collections.Counter from it. As usual, take special care when comparing floating-point numbers.

    0 讨论(0)
  • 2020-12-15 06:33

    A slightly better solution in my opinion is the following

    tmpL = np.array([3, 2, 3, 2, 5, 2, 2, 3, 3, 2, 2, 2, 3, 3, 2, 2, 3, 2, 3, 2])
    unique, counts = np.unique(tmpL, return_counts=True)
    return unique[np.argmax(counts)]
    

    Using np.unique we can get the count of each unique elements. The index of the max element in counts will be the corresponding element in unique.

    0 讨论(0)
提交回复
热议问题