How to write word boundary inside character class in python without losing its meaning? I wish to add underscore(_) in definition of word boundary(\b)

后端 未结 1 1005
日久生厌
日久生厌 2020-12-03 15:51

I am aware that definition of word boundary is (? and i wish to add underscore(optionally) too in definition of word boundary

相关标签:
1条回答
  • 2020-12-03 16:50

    You may use lookarounds:

    (?:\b|(?<=_))word(?=\b|_)
    ^^^^^^^^^^^^^     ^^^^^^^
    

    See the regex demo where (?:\b|(?<=_)) is a non-capturing group matching either a word boundary or a location preceded with _, and (?=\b|_) is a positive lookahead matching either a word boundary or a _ symbol.

    Unfortunately, Python re won't allow using (?<=\b|_) as the lookbehind pattern should be of fixed width (else, you will get look-behind requires fixed-width pattern error).

    A Python demo:

    import re
    rx = r"(?:\b|(?<=_))word(?=\b|_)"
    s = "some_word_here and a word there"
    print(re.findall(rx,s))
    

    An alternative solution is to use custom word boundaries like (?<![^\W_]) / (?![^\W_]) (see online demo):

    rx = r"(?<![^\W_])word(?![^\W_])"
    

    The (?<![^\W_]) negative lookbehind fails a match if there is no character other than non-word and _ char (so, it requires the start of string or any word char excluding _ before the search word) and (?![^\W_]) negative lookahead will fail the match if there is no char other than non-word and _ char (that is, requires the end of string or a word char excluding _).

    0 讨论(0)
提交回复
热议问题