Regex pattern to handle both case-sensitive and case-insensitive in a single statement

后端 未结 1 1257
情深已故
情深已故 2020-12-21 16:16

I have a small regex to handle. I have 2 different terms.

  1. \"United States\", which I would like to match ignoring the case
  2. \"US\", which I would like
相关标签:
1条回答
  • 2020-12-21 17:10

    In legacy Python versions, (?i) turns on "ignore case" flag for the entire expression. From official doc:

    (?aiLmsux)

    (One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. Flags should be used first in the expression string.

    Since Python 3.6, however, you could toggle the flags within a part of the expression:

    (?imsx-imsx:...)

    (Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

    New in version 3.6.

    For example, (?i:foo)bar matches foobar and FOObar but not fooBAR. So to answer your question:

    >>> re.sub('(?i:United States)|US', 'USA', 'united states and US and us')
    'USA and USA and us'
    

    Note this only works in Python 3.6+.

    0 讨论(0)
提交回复
热议问题