R-regex: match strings not beginning with a pattern

后端 未结 3 531
谎友^
谎友^ 2020-12-29 06:43

I\'d like to use regex to see if a string does not begin with a certain pattern. While I can use: [^ to blacklist certain characters, I can\'t figure out how to

3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-29 07:12

    I got stuck on the following special case, so I thought I would share...

    What if there are multiple instances of the regular expression, but you still only want the first segment?

    Apparently you can turn off the implicit greediness of the search with specific perl wildcard modifiers

    Suppose the string I wanted to process was

    myExampleString = paste0(c(letters[1:13], "_", letters[14:26], "__",
                               LETTERS[1:13], "_", LETTERS[14:26], "__",
                               "laksjdl", "_", "lakdjlfalsjdf"),
                             collapse = "")
    myExampleString
    

    "abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ__laksjdl_lakdjlfalsjd"

    and that I wanted only the first segment before the first "__". I cannot simply search on "_", because single-underscore is an allowable non-delimiter in this example string.

    The following doesn't work. It instead gives me the first and second segments because of the default greediness (but not third, because of the forward-look).

    gsub("^(.+(?=__)).*$", "\\1", myExampleString, perl = TRUE)
    

    "abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ"

    But this does work

    gsub("^(.+?(?=__)).*$", "\\1", myExampleString, perl = TRUE)
    

    "abcdefghijklm_nopqrstuvwxyz"

    The difference is the greedy-modifier "?" after the wildcard ".+" in the (perl) regular expression.

提交回复
热议问题