问题
I have three columns like shown below, and trying to return top1 and top2 highest count of the third column. I want this output to be generated as shown in the expected output . DATA :
print (df)
AGE GENDER rating
0 10 M PG
1 10 M R
2 10 M R
3 4 F PG13
4 4 F PG13
CODE :
s = (df.groupby(['AGE', 'GENDER'])['rating']
.apply(lambda x: x.value_counts().head(2))
.rename_axis(('a','b', 'c'))
.reset_index(level=2)['c'])
output :
print (s)
a b
4 F PG13
10 M R
M PG
Name: c, dtype: object
EXPECTED OUTPUT :
print (s[F])
('PG13')
print(s[M])
('PG13', 'R')
回答1:
I think you need:
s = (df.groupby(['AGE', 'GENDER'])['rating']
.apply(lambda x: x.value_counts().head(2))
.rename_axis(('a','b', 'c'))
.reset_index()
.groupby('b')['c']
.apply(list)
.to_dict()
)
print (s)
{'M': ['R', 'PG'], 'F': ['PG13']}
来源:https://stackoverflow.com/questions/48768632/printing-the-top-2-of-frequently-occurred-values-of-the-target-column