问题
I have a simple question about subsetting using R; I think I am close but can't quite get it. Basically, I have 25 columns of interest and about 100 values. Any row that has ANY of those values in at one of the columns, I want to keep. Simple example:
Values <- c(1,2,5)
col1 <- c(2,6,8,1,3,5)
col2 <- c(1,4,5,9,0,0)
col3 <- c('dog', 'cat', 'cat', 'pig', 'chicken', 'cat')
df <- cbind.data.frame(col1, col2, col3)
df1 <- subset(df, col1%in%Values)
(Note that the third column is to indicate that there are additional columns but I don't need to match the values to those; the rows retained only depend upon columns 1 and 2). I know that in this trivial case I could just add
| col2%in%Values
to get the additional rows from column 2, but with 25 columns I don't want to add an OR statement for every single one. I tried
file2011_test <- file2011[file2011[,9:33]%in%CO_codes] #real names of values
but it didn't work. (And yes I know this is mixing subsetting types; I find subset() easier to understand but I don't think it can help me with what I need?)
回答1:
May be you can try:
df[Reduce(`|`, lapply(as.data.frame(df), function(x) x %in% Values)),]
# col1 col2
#[1,] 2 1
#[2,] 8 5
#[3,] 1 9
#[4,] 5 0
Or
indx <- df %in% Values
dim(indx) <- dim(df)
df[!!rowSums(indx),]
# col1 col2
# [1,] 2 1
# [2,] 8 5
# [3,] 1 9
# [4,] 5 0
Update
Using the new dataset
df[Reduce(`|`, lapply(df[sapply(df, is.numeric)], function(x) x %in% Values)),]
# col1 col2 col3
#1 2 1 dog
#3 8 5 cat
#4 1 9 pig
#6 5 0 cat
回答2:
take a look at data.table package. It is very intuitive and literally 100 times faster.
library(data.table)
df <- data.table(col1, col2, col3)
df[col1%in%Values | col2%in%Values]
# col1 col2 col3
#1: 2 1 dog
#2: 8 5 cat
#3: 1 9 pig
#4: 5 0 cat
If you want to do this for all column you can do this with:
df[rowSums(sapply(df, '%in%', Values) )>0]
# col1 col2 col3
#1: 2 1 dog
#2: 8 5 cat
#3: 1 9 pig
#4: 5 0 cat
来源:https://stackoverflow.com/questions/25692392/keeping-rows-if-any-column-matches-one-of-a-set-of-values