问题
Why in the code below dplyr's filter
doesn't return the same data.frame as base R subsetting?
In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1
. That is, I'd like to remove only the third row.
require(dplyr)
df <- data.frame(a=c(0,0,0,0,1,1,1),
b=c(0,0,1,1,0,0,1),
c=c(1,NA,1,NA,1,NA,NA))
filter(df, !(b==1 & c==1))
df[!(df$b==1 & df$c==1),]
回答1:
Or use complete.cases
to convert NA
to FALSE
in the result logic vector so that you can pick the corresponding rows up after the negation, and this uses the fact that NA & F = F
:
filter(df, !(b == 1 & c == 1 & complete.cases(df[c('b', 'c')])))
# a b c
# 1 0 0 1
# 2 0 0 NA
# 3 0 1 NA
# 4 1 0 1
# 5 1 0 NA
# 6 1 1 NA
More logical operations with NA
involved here, which is a little bit confusing at the first glance but they are following the logic:
NA & F
# [1] FALSE
NA | T
# [1] TRUE
NA & T
# [1] NA
NA | F
# [1] NA
回答2:
This is the simplest option I can think of:
filter(df, !((b==1 & c==1) %in% TRUE))
# a b c
#1 0 0 1
#2 0 0 NA
#3 0 1 NA
#4 1 0 1
#5 1 0 NA
#6 1 1 NA
# or equivalently in data.table
dt[!((b==1 & c==1) %in% TRUE)]
Another, perhaps more verbose/clear option is to use !(b==1 & c==1) | is.na(b+c)
as the comparison.
回答3:
Using data.table
library(data.table)
setDT(df)[df[,!(b==1 & c== 1& complete.cases(.SD[, c('b', 'c'), with = FALSE]))]]
# a b c
#1: 0 0 1
#2: 0 0 NA
#3: 0 1 NA
#4: 1 0 1
#5: 1 0 NA
#6: 1 1 NA
回答4:
Yes, the NA
values cause problems. Here's 4 workarounds:
Method 1: 2-step Exclusion
n <- (df$b+df$c==2)
df[n %in% c(NA, "FALSE"),]
a b c 1 0 0 1 2 0 0 NA 4 0 1 NA 5 1 0 1 6 1 0 NA 7 1 1 NA
Method 2: Conditional Sum
df[!(complete.cases(df$b,df$c) & df$b+df$c == 2),]
a b c 1 0 0 1 2 0 0 NA 4 0 1 NA 5 1 0 1 6 1 0 NA 7 1 1 NA
Method 3: Loop/Function
filterwithNA <- function(df,n){
for(i in 1:nrow(df)){
if(!is.na(df$b[i]) & !(is.na(df$c[i]))){
if(df$b[i] == n & df$c[i] == n){
df <- df[-i,]
}
}
}
return(df)
}
filterwithNA(df, n=1)
a b c 1 0 0 1 2 0 0 NA 4 0 1 NA 5 1 0 1 6 1 0 NA 7 1 1 NA
Method 4: Temporary numeric replacement
df[is.na(df)] <- 999
df[!(df$b==1 & df$c==1),]
df[df==999] <- NA
df
a b c 1 0 0 1 2 0 0 NA 4 0 1 NA 5 1 0 1 6 1 0 NA 7 1 1 NA
来源:https://stackoverflow.com/questions/38948196/unable-to-subset-filter-a-data-frame-due-to-nas