Subset dataframe by multiple logical conditions of rows to remove

拈花ヽ惹草 提交于 2019-11-25 19:17:10

The ! should be around the outside of the statement:

data[!(data$v1 %in% c("b", "d", "e")), ]

  v1 v2 v3 v4
1  a  v  d  c
2  a  v  d  d
5  c  k  d  c
6  c  r  p  g

Try this

subset(data, !(v1 %in% c("b","d","e")))
N Brouwer

You can also accomplish this by breaking things up into separate logical statements by including & to separate the statements.

subset(my.df, my.df$v1 != "b" & my.df$v1 != "d" & my.df$v1 != "e")

This is not elegant and takes more code but might be more readable to newer R users. As pointed out in a comment above, subset is a "convenience" function that is best used when working interactively.

paul c
data <- data[-which(data[,1] %in% c("b","d","e")),]

This answer is more meant to explain why, not how. The '==' operator in R is vectorized in a same way as the '+' operator. It matches the elements of whatever is on the left side to the elements of whatever is on the right side, per element. For example:

> 1:3 == 1:3
[1] TRUE TRUE TRUE

Here the first test is 1==1 which is TRUE, the second 2==2 and the third 3==3. Notice that this returns a FALSE in the first and second element because the order is wrong:

> 3:1 == 1:3
[1] FALSE  TRUE FALSE

Now if one object is smaller then the other object then the smaller object is repeated as much as it takes to match the larger object. If the size of the larger object is not a multiplication of the size of the smaller object you get a warning that not all elements are repeated. For example:

>  1:2 == 1:3
[1]  TRUE  TRUE FALSE
Warning message:
In 1:2 == 1:3 :
  longer object length is not a multiple of shorter object length

Here the first match is 1==1, then 2==2, and finally 1==3 (FALSE) because the left side is smaller. If one of the sides is only one element then that is repeated:

> 1:3 == 1
[1]  TRUE FALSE FALSE

The correct operator to test if an element is in a vector is indeed '%in%' which is vectorized only to the left element (for each element in the left vector it is tested if it is part of any object in the right element).

Alternatively, you can use '&' to combine two logical statements. '&' takes two elements and checks elementwise if both are TRUE:

> 1:3 == 1 & 1:3 != 2
[1]  TRUE FALSE FALSE
my.df <- read.table(textConnection("
v1 v2 v3 v4
a  v  d  c
a  v  d  d
b  n  p  g
b  d  d  h    
c  k  d  c    
c  r  p  g
d  v  d  x
d  v  d  c
e  v  d  b
e  v  d  c"), header = TRUE)

my.df[which(my.df$v1 != "b" & my.df$v1 != "d" & my.df$v1 != "e" ), ]

  v1 v2 v3 v4
1  a  v  d  c
2  a  v  d  d
5  c  k  d  c
6  c  r  p  g
Hernan
sub.data<-data[ data[,1] != "b"  & data[,1] != "d" & data[,1] != "e" , ]

Larger but simple to understand (I guess) and can be used with multiple columns, even with !is.na( data[,1]).

And also

library(dplyr)
data %>% filter(!v1 %in% c("b", "d", "e"))

or

data %>% filter(v1 != "b" & v1 != "d" & v1 != "e")

or

data %>% filter(v1 != "b", v1 != "d", v1 != "e")

Since the & operator is implied by the comma.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!