Select rows from dataframe with unique combination of values from multiple columns

问题

I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team, opponent_team, date, result, team_runs, opponent_runs, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.

For example

team  opponent_team  date           result team_runs opponent_runs
BAL   BOS            2010-04-05      W      5         4

has another row somewhere else that is

team  opponent_team  date           result team_runs opponent_runs
BOS   BAL            2010-04-05      L      4         5

I would like to write some code in dplyr or something similar that selects rows that have a unique combination of the team, opponent_team and date columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.

Thanks

回答1:

Have you tried distinct function from dplyr? For your case, it can be something like

library(dplyr)
df %>% distinct(team, opponent_team, date)

Another alternative is to use duplicated function from base R inside filter function of dplyr like below.

filter(!duplicated(team, opponent_team, date)

来源：https://stackoverflow.com/questions/36092076/select-rows-from-dataframe-with-unique-combination-of-values-from-multiple-colum

标签

select

dataframe

dplyr

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!