Subsetting a data frame based on key spanning several columns in another (summary) data frame

假如想象 提交于 2019-12-02 04:14:38

If we already created 3 datasets and want to subset the first "a" based on the elements of "c/c1", one option is anti_join from dplyr

library(dplyr)
anti_join(a, c1, by=c('A', 'B', 'C'))

Update

Or we could use a base R option with interaction to paste the columns of interest together in both datasets and check whether the elements of 2nd ('c') are in 1st ('a') using %in%. The logical index can be used to subset "a".

 a1 <- a[!(as.character(interaction(a[1:3], sep=".")) %in% 
          as.character(interaction(c[LETTERS[1:3]], sep="."))),]

Or as @David Arenburg mentioned, we may not need to create b, or c datasets to get the expected output. Using plyr, create a new mean column ("mean_Val") in "a" with mutate and subset the rows with mean greater than 0 (mean_Val >0)

 library(plyr)
 subset(ddply(a, ~B+C+A, mutate, mean_Val=mean(Val)), mean_Val>0)

Or a similar approach using dplyr

 library(dplyr)
  a %>%
     group_by(B, C, A) %>%
     mutate(mean_Val=mean(Val)) %>% 
     filter(mean_Val>0)

Or if we don't need the "mean" values as a column in "a", ave from base R could be used as well.

  a[!!with(a, ave(Val, B, C, A, FUN=function(x) mean(x)>0)),]

If we need to keep the mean_Val column (a variation proposed by @David Arenburg)

  subset(transform(a, Mean_Val = ave(Val, B, C, A, FUN = mean)),
                 Mean_Val > 0)

data

set.seed(24)
a <- data.frame(A= sample(LETTERS[1:3], 20, replace=TRUE), 
   B=sample(LETTERS[1:3], 20, replace=TRUE), C=sample(LETTERS[1:3], 
         20, replace=TRUE), D=rnorm(20))

b <- a %>% 
       group_by(A, B, C) %>% 
       summarise(D=sum(D))
set.seed(39)
c1 <- b[sample(1:nrow(b), 6, replace=FALSE),]

Here's a possible data.table solution which won't require creating neither b or c

library(data.table) 
as.data.table(a)[, if(mean(Val) > 0) .SD, by = list(B, C, A)]

Or similarly (If you also want the mean itself)

as.data.table(a)[, Mean_Val := mean(Val), list(B, C, A)][Mean_Val > 0]
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!