Perl: Matching string not containing PATTERN

前端 未结 2 1943
遇见更好的自我
遇见更好的自我 2020-12-15 11:09

While using Perl regex to chop a string down into usable pieces I had the need to match everything except a certain pattern. I solved it after I found this hint on Perl Monk

2条回答
  •  别那么骄傲
    2020-12-15 11:50

    Building it up piece by piece (and throughout assuming no newlines in the string or PATTERN):

    This matches any string:

    /^.*$/
    

    But we don't want . to match a character that starts PATTERN, so replace

    .
    

    with

    (?!PATTERN).
    

    This uses a negative look-ahead that tests a given pattern without actually consuming any of the string and only succeeds if the pattern does not match at the given point in the string. So it's like saying:

    if PATTERN doesn't match at this point,
        match the next character
    

    This needs to be done for every character in the string, so * is used to match zero or more times, from the beginning to the end of the string.

    To make the * apply to the combination of the negative look-ahead and ., not just the ., it needs to be surrounded by parentheses, and since there's no reason to capture, they should be non-capturing parentheses (?: ):

    (?:(?!PATTERN).)*
    

    And putting back the anchors to make sure we test at every position in the string:

    /^(?:(?!PATTERN).)*$/
    

    Note that this solution is particularly useful as part of a larger match; e.g. to match any string with foo and later baz but no bar in between:

    /foo(?:(?!bar).)*baz/
    

    If there aren't such considerations, you can simply do:

    /^(?!.*PATTERN)/
    

    to check that PATTERN does not match anywhere in the string.

    About newlines: there are two problems with your regex and newlines. First, . doesn't match newlines, so "foo\nbar" =~ /^(?:(?!baz).)*$/ doesn't match, even though the string does not contain baz. You need to add the /s flag to make . match any character; "foo\nbar" =~ /^(?:(?!baz).)*$/s correctly matches. Second, $ doesn't match just at the end of the string, it also can match before a newline at the end of the string. So "foo\n" =~ /^(?:(?!\s).)*$/s does match, even though the string contains whitespace and you are attempting to only match strings with no whitespace; \z always only matches at the end, so "foo\n" =~ /^(?:(?!\s).)*\z/s correctly fails to match the string that does in fact contain a \s. So the correct general purpose regex is /^(?:(?!PATTERN).)*\z/s

提交回复
热议问题