pandas: How to get the most frequent item in pandas series?

你离开我真会死。 提交于 2019-12-08 19:06:28

问题


How can I get the most frequent item in a pandas series?

Consider the series s

s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)

The returned value should be 3


回答1:


You can just use pd.Series.mode and extract the first value:

res = s.mode().iloc[0]

This not necessarily inefficient. As always, test with your data to see what suits.

import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter

np.random.seed(0)

s = pd.Series(np.random.randint(0, 100, 100000))

def jez_np(s):
    _, idx, counts = np.unique(s, return_index=True, return_counts=True)
    index = idx[np.argmax(counts)]
    val = s[index]
    return val

def pir(s):
    i, r = s.factorize()
    return r[np.bincount(i).argmax()]

%timeit s.mode().iloc[0]                 # 1.82 ms
%timeit pir(s)                           # 2.21 ms
%timeit s.value_counts().index[0]        # 2.52 ms
%timeit mode(s).mode[0]                  # 5.64 ms
%timeit jez_np(s)                        # 8.26 ms
%timeit Counter(s).most_common(1)[0][0]  # 8.27 ms



回答2:


Use value_counts and select first value by index:

val = s.value_counts().index[0]

Or Counter.most_common:

from collections import Counter

val = Counter(s).most_common(1)[0][0]

Or numpy solution:

_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]



回答3:


pandas.factorize and numpy.bincount

This is very similar to @jezrael's Numpy answer. The difference is the use of factorize and not numpy.unique

  • factorize returns an integer factorization and unique values
  • bincount counts how many of each unique value
  • argmax identifies which bin or factor is the most fequent
  • Use the position of the bin returned from argmax to reference the most frequent value from the array of unique values

i, r = s.factorize()
r[np.bincount(i).argmax()]

3



回答4:


from scipy import stats
import pandas as pd
x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
data=pd.DataFrame({"values":x})


print(stats.mode(data["values"]))

output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))


来源:https://stackoverflow.com/questions/52038896/pandas-how-to-get-the-most-frequent-item-in-pandas-series

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!