问题
I have two columns home and away. So one row will be England vs Brazil and the next row will be Brazil England. How can I count occurrences of when Brazil faces England or England vs Brazil in one count?
Based on previous solutions, I have tried
results.groupby(["home_team", "away_team"]).size()
results.groupby(["away_team", "home_team"]).size()
however this does not give me the outcome that I am looking for.
Undesired output:
home_team away_team
England Brazil 1
away_team home_team
Brazil England 1
I would like to see: England Brazil 2
回答1:
You can sort values by numpy.sort, create DataFrame and use your original solution:
df1 = (pd.DataFrame(np.sort(df[['home','away']], axis=1), columns=['home','away'])
.groupby(["home", "away"])
.size())
回答2:
May be you need below:
df = pd.DataFrame({
'home':['England', 'Brazil', 'Spain'],
'away':['Brazil', 'England', 'Germany']
})
pd.Series('-'.join(sorted(tup)) for tup in zip(df['home'], df['away'])).value_counts()
Output:
Brazil-England 2
Germany-Spain 1
dtype: int64
PS: If you do not like the - between team names, you can use:
pd.Series(' '.join(sorted(tup)) for tup in zip(df['home'], df['away'])).value_counts()
回答3:
Option 1
You can use numpy.sort to sort the values of the dataframe
However, as that sorts in place, maybe it is better to create a copy of the dataframe.
dfTeams = pd.DataFrame(data=df.values.copy(), columns=['team1','team2'])
dfTeams.values.sort()
(I changed the column names, because with the sorting you are changing their meaning)
After having done this, you can use your groupby.
results.groupby(['team1', 'team2']).size()
Option 2
Since a more general title for your question would be something like how can I count combination of values in multiple columns on a dataframe, independently of their order, you could use a set.
A set object is an unordered collection of distinct hashable objects.
More precisely, create a Series of frozen sets, and then count values.
pd.Series(map(lambda home, away: frozenset({home, away}),
df['home'],
df['away'])).value_counts()
Note: I use the dataframe in @Harv Ipan's answer.
来源:https://stackoverflow.com/questions/50839433/can-i-combine-groupby-data