pandas: How to get the most frequent item in pandas series?

后端 未结 4 1908
我寻月下人不归
我寻月下人不归 2020-12-20 18:26

How can I get the most frequent item in a pandas series?

Consider the series s

s = pd.Series(\"1 5 3 3 3 5 2 1 8 10 2 3 3 3         


        
相关标签:
4条回答
  • 2020-12-20 18:41

    Use value_counts and select first value by index:

    val = s.value_counts().index[0]
    

    Or Counter.most_common:

    from collections import Counter
    
    val = Counter(s).most_common(1)[0][0]
    

    Or numpy solution:

    _, idx, counts = np.unique(s, return_index=True, return_counts=True)
    index = idx[np.argmax(counts)]
    val = s[index]
    
    0 讨论(0)
  • 2020-12-20 18:42

    pandas.factorize and numpy.bincount

    This is very similar to @jezrael's Numpy answer. The difference is the use of factorize and not numpy.unique

    • factorize returns an integer factorization and unique values
    • bincount counts how many of each unique value
    • argmax identifies which bin or factor is the most fequent
    • Use the position of the bin returned from argmax to reference the most frequent value from the array of unique values

    i, r = s.factorize()
    r[np.bincount(i).argmax()]
    
    3
    
    0 讨论(0)
  • 2020-12-20 18:54

    You can just use pd.Series.mode and extract the first value:

    res = s.mode().iloc[0]
    

    This not necessarily inefficient. As always, test with your data to see what suits.

    import numpy as np, pandas as pd
    from scipy.stats.mstats import mode
    from collections import Counter
    
    np.random.seed(0)
    
    s = pd.Series(np.random.randint(0, 100, 100000))
    
    def jez_np(s):
        _, idx, counts = np.unique(s, return_index=True, return_counts=True)
        index = idx[np.argmax(counts)]
        val = s[index]
        return val
    
    def pir(s):
        i, r = s.factorize()
        return r[np.bincount(i).argmax()]
    
    %timeit s.mode().iloc[0]                 # 1.82 ms
    %timeit pir(s)                           # 2.21 ms
    %timeit s.value_counts().index[0]        # 2.52 ms
    %timeit mode(s).mode[0]                  # 5.64 ms
    %timeit jez_np(s)                        # 8.26 ms
    %timeit Counter(s).most_common(1)[0][0]  # 8.27 ms
    
    0 讨论(0)
  • 2020-12-20 18:59
    from scipy import stats
    import pandas as pd
    x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
    data=pd.DataFrame({"values":x})
    
    
    print(stats.mode(data["values"]))
    
    output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))
    
    0 讨论(0)
提交回复
热议问题