Keeping rows if any column matches one of a set of values

泪湿孤枕 提交于 2019-12-24 00:58:16

问题


I have a simple question about subsetting using R; I think I am close but can't quite get it. Basically, I have 25 columns of interest and about 100 values. Any row that has ANY of those values in at one of the columns, I want to keep. Simple example:

Values <- c(1,2,5)

col1 <- c(2,6,8,1,3,5)
col2 <- c(1,4,5,9,0,0)
col3 <- c('dog', 'cat', 'cat', 'pig', 'chicken', 'cat')
df <- cbind.data.frame(col1, col2, col3)

df1 <- subset(df, col1%in%Values)

(Note that the third column is to indicate that there are additional columns but I don't need to match the values to those; the rows retained only depend upon columns 1 and 2). I know that in this trivial case I could just add

| col2%in%Values

to get the additional rows from column 2, but with 25 columns I don't want to add an OR statement for every single one. I tried

 file2011_test <- file2011[file2011[,9:33]%in%CO_codes] #real names of values

but it didn't work. (And yes I know this is mixing subsetting types; I find subset() easier to understand but I don't think it can help me with what I need?)


回答1:


May be you can try:

df[Reduce(`|`, lapply(as.data.frame(df), function(x) x %in% Values)),]
#        col1 col2
#[1,]    2    1
#[2,]    8    5
#[3,]    1    9
#[4,]    5    0

Or

 indx <- df %in% Values
 dim(indx) <- dim(df)
 df[!!rowSums(indx),]
 #        col1 col2
 # [1,]    2    1
 # [2,]    8    5
 # [3,]    1    9
 # [4,]    5    0

Update

Using the new dataset

 df[Reduce(`|`, lapply(df[sapply(df, is.numeric)], function(x) x %in% Values)),]
 #     col1 col2 col3
 #1    2    1  dog
 #3    8    5  cat
 #4    1    9  pig
 #6    5    0  cat



回答2:


take a look at data.table package. It is very intuitive and literally 100 times faster.

library(data.table)
df <- data.table(col1, col2, col3)
df[col1%in%Values | col2%in%Values]

#    col1 col2 col3
#1:    2    1  dog
#2:    8    5  cat
#3:    1    9  pig
#4:    5    0  cat

If you want to do this for all column you can do this with:

df[rowSums(sapply(df, '%in%', Values) )>0]
#   col1 col2 col3
#1:    2    1  dog
#2:    8    5  cat
#3:    1    9  pig
#4:    5    0  cat


来源:https://stackoverflow.com/questions/25692392/keeping-rows-if-any-column-matches-one-of-a-set-of-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!