Subset multiple columns in R - more elegant code?

寵の児 提交于 2019-12-04 07:40:51

I'm not sure if you need two apply calls here:

# Data
df=data.frame(x=1:4,Col1=c(11,12,3,13),Col2=c(9,12,10,13),Col3=c(9,13,42,23))
criteria=1:10

# Solution
df[apply(df [c('Col1','Col2','Col3')],1,function(x) any(x %in% criteria)),]

Unless you want to do a lot of columns, then it is probably more readable to say:

subset(df, Col1 %in% criteria | Col2 %in% criteria | Col3 %in% criteria) 

I'm using DF rather than data as the example.

DF[apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), 
               c(1,2), `%in%`, criteria), 
         1, any),]

For a breakdown of what this is doing:

Make a matrix of the specified columns, and for each element in that matrix test if it contains one of the criteria. Then for each row of that matrix, see if any of the row elements are TRUE. If so, keep the corresponding row of the original dataset.

Working through an example:

Start with dummy data:

DF <- data.frame(Col1=seq(1, by=2, length=10),
                 Col2=seq(3, by=3, length=10),
                 Col3=seq(7, by=1, length=10),
                 other=LETTERS[1:10])

which looks like

> DF
   Col1 Col2 Col3 other
1     1    3    7     A
2     3    6    8     B
3     5    9    9     C
4     7   12   10     D
5     9   15   11     E
6    11   18   12     F
7    13   21   13     G
8    15   24   14     H
9    17   27   15     I
10   19   30   16     J

Pull out just the columns of interest.

> as.matrix(DF[c("Col1","Col2","Col3")])
      Col1 Col2 Col3
 [1,]    1    3    7
 [2,]    3    6    8
 [3,]    5    9    9
 [4,]    7   12   10
 [5,]    9   15   11
 [6,]   11   18   12
 [7,]   13   21   13
 [8,]   15   24   14
 [9,]   17   27   15
[10,]   19   30   16

Check each entry versus the criteria

> apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria)
       Col1  Col2  Col3
 [1,]  TRUE  TRUE  TRUE
 [2,]  TRUE  TRUE  TRUE
 [3,]  TRUE  TRUE  TRUE
 [4,]  TRUE FALSE  TRUE
 [5,]  TRUE FALSE FALSE
 [6,] FALSE FALSE FALSE
 [7,] FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE
 [9,] FALSE FALSE FALSE
[10,] FALSE FALSE FALSE

Test if any of the values in a row are TRUE

> apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Use that to index the original data frame.

> DF[apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any),]
  Col1 Col2 Col3 other
1    1    3    7     A
2    3    6    8     B
3    5    9    9     C
4    7   12   10     D
5    9   15   11     E
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!