问题
How can I get the most frequent item in a pandas series?
Consider the series s
s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)
The returned value should be 3
回答1:
You can just use pd.Series.mode and extract the first value:
res = s.mode().iloc[0]
This not necessarily inefficient. As always, test with your data to see what suits.
import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter
np.random.seed(0)
s = pd.Series(np.random.randint(0, 100, 100000))
def jez_np(s):
_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
return val
def pir(s):
i, r = s.factorize()
return r[np.bincount(i).argmax()]
%timeit s.mode().iloc[0] # 1.82 ms
%timeit pir(s) # 2.21 ms
%timeit s.value_counts().index[0] # 2.52 ms
%timeit mode(s).mode[0] # 5.64 ms
%timeit jez_np(s) # 8.26 ms
%timeit Counter(s).most_common(1)[0][0] # 8.27 ms
回答2:
Use value_counts and select first value by index:
val = s.value_counts().index[0]
Or Counter.most_common:
from collections import Counter
val = Counter(s).most_common(1)[0][0]
Or numpy solution:
_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
回答3:
pandas.factorize and numpy.bincount
This is very similar to @jezrael's Numpy answer. The difference is the use of factorize and not numpy.unique
factorizereturns an integer factorization and unique valuesbincountcounts how many of each unique valueargmaxidentifies which bin or factor is the most fequent- Use the position of the bin returned from
argmaxto reference the most frequent value from the array of unique values
i, r = s.factorize()
r[np.bincount(i).argmax()]
3
回答4:
from scipy import stats
import pandas as pd
x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
data=pd.DataFrame({"values":x})
print(stats.mode(data["values"]))
output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))
来源:https://stackoverflow.com/questions/52038896/pandas-how-to-get-the-most-frequent-item-in-pandas-series