grepl in R to find matches to any of a list of character strings

后端 未结 3 1408
离开以前
离开以前 2020-12-05 05:23

Is it possible to use a grepl argument when referring to a list of values, maybe using the %in% operator? I want to take the data below and if the animal name has \

相关标签:
3条回答
  • 2020-12-05 05:53

    Try to avoid ifelse as much as possible. This, for example, works nicely

    c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]
    

    For a 123 seed you will get

    ##  [1] "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Discard" "Keep"   
    ##  [9] "Discard" "Discard" "Keep"    "Discard" "Keep"    "Discard" "Keep"    "Keep"   
    ## [17] "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"    "Keep"   
    ## [25] "Keep"    "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
    ## [33] "Keep"    "Keep"    "Keep"    "Discard" "Keep"    "Keep"    "Keep"    "Keep"   
    ## [41] "Keep"    "Discard" "Discard" "Keep"    "Keep"    "Keep"    "Keep"    "Discard"
    ## [49] "Keep"    "Keep"   
    
    0 讨论(0)
  • 2020-12-05 05:57

    You can use an "or" (|) statement inside the regular expression of grepl.

    ifelse(grepl("dog|cat", data$animal), "keep", "discard")
    # [1] "keep"    "keep"    "discard" "keep"    "keep"    "keep"    "keep"    "discard"
    # [9] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "discard" "keep"   
    #[17] "discard" "keep"    "keep"    "discard" "keep"    "keep"    "discard" "keep"   
    #[25] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
    #[33] "keep"    "discard" "keep"    "discard" "keep"    "discard" "keep"    "keep"   
    #[41] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"   
    #[49] "keep"    "discard"
    

    The regular expression dog|cat tells the regular expression engine to look for either "dog" or "cat", and return the matches for both.

    0 讨论(0)
  • 2020-12-05 06:16

    Not sure what you tried but this seems to work:

    data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")
    

    Similar to the answer you linked to.

    The trick is using the paste:

    paste(matches, collapse = "|")
    #[1] "cat|dog"
    

    So it creates a regular expression with either dog OR cat and would also work with a long list of patterns without typing each.

    Edit:

    In case you are doing this to later on subset the data.frame according to "Keep" and "Discard" entries, you could do this more directly using:

    data[grepl(paste(matches, collapse = "|"), data$animal),]
    

    This way, the results of grepl which are TRUE or FALSE are used for the subset.

    0 讨论(0)
提交回复
热议问题