Filter one column by matching to another column

天大地大妈咪最大 提交于 2021-02-10 12:40:50

问题


I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below:

df <- data.frame(pair = c(1, 1, 2, 2, 3, 3),
                 animal = rep(c("dog", "cat"), 3), 
                 value = seq(1, 12, 2), 
                 drop = c("no", "no", "dog", "dog", "cat", "cat"))

  pair animal value drop
1    1    dog     1   no
2    1    cat     3   no
3    2    dog     5  dog
4    2    cat     7  dog
5    3    dog     9  cat
6    3    cat    11  cat

I'm trying to want to filter the data frame according to whether the value of animal matches the value of drop. I want something like filter(df, animal != drop) to remove rows where only the value of animal matches the value of drop:

  pair animal value drop
1    1    dog     1   no
2    1    cat     3   no
4    2    cat     7  dog
5    3    dog     9  cat

I also tried writing a simple loop to test whether animal matches drop for each row and remove the row if true, but I couldn't get it working. (I'm not very confident with loops and would prefer not to use one if possible as my data frame is very large but I was getting desperate!)

for(i in nrow(df)){
  if(df$animal[i] == df$drop[i]){
    df <- df[-i,]
    return(df)
  }
}

Is there a way of doing this using dplyr?


回答1:


The use of filter(df, animal != drop) is correct. However, as you haven't specified stringsAsFactors = F in your data.frame() call, all strings are converted to factors, raising the error of different level sets. Thus adding stringsAsFactors = F, should solve this

df <- data.frame(pair = c(1, 1, 2, 2, 3, 3),
                 animal = rep(c("dog", "cat"), 3), 
                 value = seq(1, 12, 2), 
                 drop = c("no", "no", "dog", "dog", "cat", "cat"),
                 stringsAsFactors = F) 

df %>%
  filter(animal != drop)

  pair animal value drop
1    1    dog     1   no
2    1    cat     3   no
3    2    cat     7  dog
4    3    dog     9  cat

To avoid issues with this undesired string to factor behaviour I highly recommend the use of tibble

In case that one does not have the opportunity to change how the data is created I here include @akrun's solution:

library(dplyr)

df %>% 
  mutate_at(vars(animal, drop), as.character) %>%       
  filter(animal != drop)
#  pair animal value drop
#1    1    dog     1   no
#2    1    cat     3   no
#3    2    cat     7  dog
#4    3    dog     9  cat



回答2:


An option would be to convert to character class with mutate_at and then use filter on non-matching elements

library(dplyr)
df %>% 
  mutate_at(vars(animal, drop), as.character) %>%       
  filter(animal != drop)
#  pair animal value drop
#1    1    dog     1   no
#2    1    cat     3   no
#3    2    cat     7  dog
#4    3    dog     9  cat


来源:https://stackoverflow.com/questions/56345162/filter-one-column-by-matching-to-another-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!