问题
I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team, opponent_team, date, result, team_runs, opponent_runs, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.
For example
team opponent_team date result team_runs opponent_runs
BAL BOS 2010-04-05 W 5 4
has another row somewhere else that is
team opponent_team date result team_runs opponent_runs
BOS BAL 2010-04-05 L 4 5
I would like to write some code in dplyr or something similar that selects rows that have a unique combination of the team, opponent_team and date columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.
Thanks
回答1:
Have you tried distinct function from dplyr? For your case, it can be something like
library(dplyr)
df %>% distinct(team, opponent_team, date)
Another alternative is to use duplicated function from base R inside filter function of dplyr like below.
filter(!duplicated(team, opponent_team, date)
来源:https://stackoverflow.com/questions/36092076/select-rows-from-dataframe-with-unique-combination-of-values-from-multiple-colum