Can I combine groupby data?

感情迁移 提交于 2021-02-05 07:42:44

问题


I have two columns home and away. So one row will be England vs Brazil and the next row will be Brazil England. How can I count occurrences of when Brazil faces England or England vs Brazil in one count?

Based on previous solutions, I have tried

results.groupby(["home_team", "away_team"]).size()
results.groupby(["away_team", "home_team"]).size()

however this does not give me the outcome that I am looking for.

Undesired output: home_team away_team
England Brazil 1

away_team home_team
Brazil England 1

I would like to see: England Brazil 2


回答1:


You can sort values by numpy.sort, create DataFrame and use your original solution:

df1 = (pd.DataFrame(np.sort(df[['home','away']], axis=1), columns=['home','away'])
        .groupby(["home", "away"])
        .size())



回答2:


May be you need below:

df = pd.DataFrame({
    'home':['England', 'Brazil', 'Spain'],
    'away':['Brazil', 'England', 'Germany']
})

pd.Series('-'.join(sorted(tup)) for tup in zip(df['home'], df['away'])).value_counts()

Output:

Brazil-England    2
Germany-Spain     1
dtype: int64

PS: If you do not like the - between team names, you can use:

pd.Series(' '.join(sorted(tup)) for tup in zip(df['home'], df['away'])).value_counts()



回答3:


Option 1

You can use numpy.sort to sort the values of the dataframe However, as that sorts in place, maybe it is better to create a copy of the dataframe.

dfTeams = pd.DataFrame(data=df.values.copy(), columns=['team1','team2'])
dfTeams.values.sort()

(I changed the column names, because with the sorting you are changing their meaning)

After having done this, you can use your groupby.

results.groupby(['team1', 'team2']).size()

Option 2

Since a more general title for your question would be something like how can I count combination of values in multiple columns on a dataframe, independently of their order, you could use a set.

A set object is an unordered collection of distinct hashable objects.

More precisely, create a Series of frozen sets, and then count values.

pd.Series(map(lambda home, away: frozenset({home, away}), 
              df['home'], 
              df['away'])).value_counts()

Note: I use the dataframe in @Harv Ipan's answer.



来源:https://stackoverflow.com/questions/50839433/can-i-combine-groupby-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!