dplyr filter with condition on multiple columns

后端 未结 4 1671
粉色の甜心
粉色の甜心 2020-12-02 23:56

Here\'s a dummy data :

father<- c(1, 1, 1, 1, 1)
mother<- c(1, 1, 1, NA, NA) 
children <- c(NA, NA, 2, 5, 2) 
cousins   <- c(NA, 5, 1, 1, 4) 


d         


        
相关标签:
4条回答
  • 2020-12-03 00:10

    Here is a base R method using two Reduce functions and [ to subset.

    keepers <- Reduce(function(x, y) x == 1 & y == 1, dataset[, 1:2]) &
               Reduce(function(x, y) is.na(x) & is.na(y), dataset[, 3:4])
    keepers
    [1]  TRUE FALSE FALSE FALSE FALSE
    

    Each Reduce consecutively takes the variables provided and performs a logical check. The two results are connected with an &. The second argument to the Reduce functions can be adjusted to include whatever variables in the data.frame that you want.

    Then use the logical vector to subset

    dataset[keepers,]
      father mother children cousins
    1      1      1       NA      NA
    
    0 讨论(0)
  • 2020-12-03 00:16

    None of the answers seems to be an adaptable solution. I think the intention is not to list all the variables and values to filter the data.

    One easy way to achieve this is through merging. If you have all the conditions in df_filter then you can do this:

    df_results = df_filter %>% left_join(df_all)
    
    0 讨论(0)
  • 2020-12-03 00:27

    A dplyr solution:

    test <- dataset %>% 
      filter(father==1 & mother==1 & rowSums(is.na(.[,3:4]))==2)
    

    Where '2' is the number of columns that should be NA.

    This gives:

    > test
      father mother children cousins
    1      1      1       NA      NA
    

    You can apply this logic in base R as well:

    dataset[dataset$father==1 & dataset$mother==1 & rowSums(is.na(dataset[,3:4]))==2,]
    
    0 讨论(0)
  • 2020-12-03 00:30

    A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

    # > packageVersion('dplyr')
    # [1] ‘0.5.0.9004’
    
    dataset %>%
        filter(!is.na(father), !is.na(mother)) %>%
        filter_at(vars(-father, -mother), all_vars(is.na(.)))
    

    Explanation:

    • vars(-father, -mother): select all columns except father and mother.
    • all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

    note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.


    Update (2020-11-28)

    Since the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

    dataset %>%
        filter(across(c(father, mother), ~ !is.na(.x))) %>%
        filter(across(c(-father, -mother), is.na))
    

    See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.

    0 讨论(0)
提交回复
热议问题