Python regular expression match multiple words anywhere

前端 未结 2 1449
星月不相逢
星月不相逢 2020-12-17 21:53

I\'m trying to use python\'s regular expression to match a string with several words. For example, the string is \"These are oranges and apples and pears, but not pinapples

相关标签:
2条回答
  • 2020-12-17 22:25

    Try this:

    >>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
    ['and', 'and', 'not', 'or']
    

    a|b means match either a or b

    \b represents a word boundary

    re.findall(pattern, string) returns an array of all instances of pattern in string

    0 讨论(0)
  • 2020-12-17 22:48

    You've got a few problems there.

    First, matches are case-sensitive unless you use the IGNORECASE/I flag to ignore case. So, 'AND' doesn't match 'and'.

    Also, unless you use the VERBOSE/X flag, those spaces are part of the pattern. So, you're checking for 'AND ', not 'AND'. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader' is going to match…), and really, you probably wanted \b, not a space (otherwise a sentence starting with 'And another thing' isn't going to match).

    Finally, if you think you need .* before and after your pattern and $ and ^ around it, there's a good chance you wanted to use search, findall, or finditer, rather than match.

    So:

    >>> s = "These are oranges and apples and pears, but not pinapples or .."
    >>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
    >>> r.findall(s)
    ['and', 'and', 'not', 'or']
    

    Regular expression visualization

    Debuggex Demo

    0 讨论(0)
提交回复
热议问题