Select rows from dataframe with unique combination of values from multiple columns

元气小坏坏 提交于 2021-02-05 06:59:05

问题


I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team, opponent_team, date, result, team_runs, opponent_runs, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.

For example

team  opponent_team  date           result team_runs opponent_runs
BAL   BOS            2010-04-05      W      5         4

has another row somewhere else that is

team  opponent_team  date           result team_runs opponent_runs
BOS   BAL            2010-04-05      L      4         5

I would like to write some code in dplyr or something similar that selects rows that have a unique combination of the team, opponent_team and date columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.

Thanks


回答1:


Have you tried distinct function from dplyr? For your case, it can be something like

library(dplyr)
df %>% distinct(team, opponent_team, date)

Another alternative is to use duplicated function from base R inside filter function of dplyr like below.

filter(!duplicated(team, opponent_team, date)


来源:https://stackoverflow.com/questions/36092076/select-rows-from-dataframe-with-unique-combination-of-values-from-multiple-colum

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!