问题
I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below:
df <- data.frame(pair = c(1, 1, 2, 2, 3, 3),
animal = rep(c("dog", "cat"), 3),
value = seq(1, 12, 2),
drop = c("no", "no", "dog", "dog", "cat", "cat"))
pair animal value drop
1 1 dog 1 no
2 1 cat 3 no
3 2 dog 5 dog
4 2 cat 7 dog
5 3 dog 9 cat
6 3 cat 11 cat
I'm trying to want to filter the data frame according to whether the value of animal
matches the value of drop
. I want something like filter(df, animal != drop)
to remove rows where only the value of animal matches the value of drop:
pair animal value drop
1 1 dog 1 no
2 1 cat 3 no
4 2 cat 7 dog
5 3 dog 9 cat
I also tried writing a simple loop to test whether animal matches drop for each row and remove the row if true, but I couldn't get it working. (I'm not very confident with loops and would prefer not to use one if possible as my data frame is very large but I was getting desperate!)
for(i in nrow(df)){
if(df$animal[i] == df$drop[i]){
df <- df[-i,]
return(df)
}
}
Is there a way of doing this using dplyr?
回答1:
The use of filter(df, animal != drop)
is correct. However, as you haven't specified stringsAsFactors = F
in your data.frame()
call, all strings are converted to factors, raising the error of different level sets. Thus adding stringsAsFactors = F
, should solve this
df <- data.frame(pair = c(1, 1, 2, 2, 3, 3),
animal = rep(c("dog", "cat"), 3),
value = seq(1, 12, 2),
drop = c("no", "no", "dog", "dog", "cat", "cat"),
stringsAsFactors = F)
df %>%
filter(animal != drop)
pair animal value drop
1 1 dog 1 no
2 1 cat 3 no
3 2 cat 7 dog
4 3 dog 9 cat
To avoid issues with this undesired string to factor behaviour I highly recommend the use of tibble
In case that one does not have the opportunity to change how the data is created I here include @akrun's solution:
library(dplyr)
df %>%
mutate_at(vars(animal, drop), as.character) %>%
filter(animal != drop)
# pair animal value drop
#1 1 dog 1 no
#2 1 cat 3 no
#3 2 cat 7 dog
#4 3 dog 9 cat
回答2:
An option would be to convert to character
class with mutate_at
and then use filter
on non-matching elements
library(dplyr)
df %>%
mutate_at(vars(animal, drop), as.character) %>%
filter(animal != drop)
# pair animal value drop
#1 1 dog 1 no
#2 1 cat 3 no
#3 2 cat 7 dog
#4 3 dog 9 cat
来源:https://stackoverflow.com/questions/56345162/filter-one-column-by-matching-to-another-column