python regex to detect a word exists

前端 未结 3 1852
庸人自扰
庸人自扰 2020-12-18 11:04

I want to detect whether a word is in a sentence using python regex. Also, want to be able to negate it.

import re
re.match(r\'(?=.*\\bfoo\\b)\', \'bar red f         


        
相关标签:
3条回答
  • 2020-12-18 11:50

    To detect if a word exists in a string you need a positive lookahead:

    (?=.*\bfoo\b)
    

    The .* is necessary to enable searching farther than just at the string start (re.match anchors the search at the string start).

    To check if a string has no word in it, use a negative lookahead:

    (?!.*\bbar\b)
     ^^^
    

    So, combining them:

    re.match(r'(?=.*\bfoo\b)(?!.*\bbar\b)', input)
    

    will find a match in a string that contains a whole word foo and does not contain a whole word bar.

    0 讨论(0)
  • 2020-12-18 11:56

    Update
    Just found that Python re.match() has an implied ^ anchor.
    In other words it will only match at the beginning of the string,
    and strangely, unlike Java, does not require it to match the entire string.

    Be warned though that combining a sequential positive and negative lookahead,
    as in Stribnez answer, can give unintended results if not anchored to
    something. Either to literal text or a BOS anchor ^.

    For general usage, don't rely on the fact that (or if), in some language
    the match() function implies a BOS anchor ^ (and possibly EOS $).
    Put one (or both) in there at all times. This way it can be used
    in search() as well. And is portable to other languages.

    To see how negative and positive, in-series lookahead's can cause problems,
    take this tricky standalone expression (?=.*\bfoo\b)(?!.*\bbar\b)

    It can be examined like this:

    Since it is in-series, both assertions have to be matched at the same
    position in the string.

    Given the same position in the string for both, the negative assertion
    can be satisfied when it finds a place that downstream does not match it's contents.

    Assuming that no anchoring exists, this leave's an opening upstream
    (between the search position and the bar literal in the example) for
    the undesired content to exist, that will still satisfy the positive/negative
    assertion pair.

    Example:
    (?=.*\bfoo\b)(?!.*\bbar\b)
    matches
    bar red foo

    **  Grp 0 -  ( pos 1 , len 0 )  EMPTY 
    
    b<here>ar red foo
    

    This shows that at position 1, both assertions are satisfied.

    Conclusion(s):
    1. Always use anchors, even if they are implied.
    2. Avoid using any language's match() function, use search() instead.

    End update


    It doesn't matter if you use a positive or negative lookahead,
    if you don't use the correct syntax, it won't work.

    Look at this (?!=.*\bfoo\b)

    This says that the next character can't be an equal sign = followed by
    a greedy number of characters up to the next foo. This is not allowed.

    So, it will not match = ab foo, but it will match '=(here) ab foo'.

    The next problem is that if you don't give the assertion anything to anchor on
    it will use a bump-along to move the postion to a place between characters
    that will satisfy it.

    The corrections for the negative lookahead you are looking for is this
    ^(?!.*\bfoo\b)


    For reference:

    (?=..)  Positive lookahead
    (?<=..) Positive lookbehind
    (?!..)  Negative lookahead
    (?<!..) Negative lookbehind   
    

    And, they can be mixed and nested anywhere.

    0 讨论(0)
  • 2020-12-18 11:56

    You need the .* because re.match() tries to match the pattern to the beginning of the string. If you want to search the whole string, use re.search().

    Just as you can do if re.search(...):, you can also do if not re.search(...):

    0 讨论(0)
提交回复
热议问题