Number of occurrence of pair of value in dataframe

我与影子孤独终老i 提交于 2019-12-04 04:49:05

For performance implications of the below solutions, see Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series. They are presented below with best performance first.

GroupBy.size

You can create a series of counts with (Name, Surname) tuple indices using GroupBy.size:

res = df.groupby(['Name', 'Surname']).size().sort_values(ascending=False)

By sorting these values, we can easily extract the most common:

most_common = res.head(1)
most_common_dups = res[res == res.iloc[0]].index.tolist()  # handles duplicate top counts

value_counts

Another way is to construct a series of tuples, then apply pd.Series.value_counts:

res = pd.Series(list(zip(df.Name, df.Surname))).value_counts()

The result will be a series of counts indexed by Name-Surname combinations, sorted from most common to least.

name, surname = res.index[0]  # return most common
most_common_dups = res[res == res.max()].index.tolist()

collections.Counter

If you wish to create a dictionary of (name, surname): counts entries, you can do so via collections.Counter:

from collections import Counter

zipper = zip(df.Name, df.Surname)
c = Counter(zipper)

Counter has useful methods such as most_common, which you can use to extract your result.

Seems like a good use case for the performant Counter:

from collections import Counter
popular_names = Counter(zip(df.Name, df.Surname)).most_common(10) 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!