Select rows based on non-directed combinations of columns

梦想的初衷 提交于 2019-12-12 05:37:53

问题


I am trying to select the maximum value in a dataframe's third column based on the combinations of the values in the first two columns.

My problem is similar to this one but I can't find a way to implement what I need.

EDIT: Sample data changed to make the column names more obvious.

Here is some sample data:

library(tidyr)
set.seed(1234)
df <- data.frame(group1 = letters[1:4], group2 = letters[1:4])
df <- df %>% expand(group1, group2)
df <- subset(df, subset = group1!=group2)
df$score <- runif(n = 12,min = 0,max = 1)
df

    # A tibble: 12 × 3
   group1 group2       score
   <fctr> <fctr>       <dbl>
1       a      b 0.113703411
2       a      c 0.622299405
3       a      d 0.609274733
4       b      a 0.623379442
5       b      c 0.860915384
6       b      d 0.640310605
7       c      a 0.009495756
8       c      b 0.232550506
9       c      d 0.666083758
10      d      a 0.514251141
11      d      b 0.693591292
12      d      c 0.544974836

In this example rows 1 and 4 are 'duplicates'. I would like to select row 4 as the value in the score column is larger than in row 1. Ultimately I would like a dataframe to be returned with the group1 and group2 columns and the maximum value in the score column. So in this example, I expect there to be 6 rows returned.

How can I do this in R?


回答1:


I'd prefer dealing with this problem in two steps:

library(dplyr)

# Create function for computing group IDs from data frame of groups (per column)
get_group_id <- function(groups) {
  apply(groups, 1, function(row) {
    paste0(sort(row), collapse = "_")
  })
}
group_id <- get_group_id(select(df, -score))

# Perform the computation
df %>%
  mutate(groupId = group_id) %>%
  group_by(groupId) %>%
  slice(which.max(score)) %>%
  ungroup() %>%
  select(-groupId)


来源:https://stackoverflow.com/questions/43009055/select-rows-based-on-non-directed-combinations-of-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!