Why does re.sub replace the entire pattern, not just a capturing group within it?

可紊 提交于 2020-06-08 04:48:46

问题


re.sub('a(b)','d','abc') yields dc, not adc.

Why does re.sub replace the entire capturing group, instead of just capturing group'(b)'?


回答1:


Because it's supposed to replace the whole occurrence of the pattern:

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.

If it were to replace only some subgroup, then complex regexes with several groups wouldn't work. There are several possible solutions:

  1. Specify pattern in full: re.sub('ab', 'ad', 'abc') - my favorite, as it's very readable and explicit.
  2. Capture groups which you want to preserve and then refer to them in the pattern (note that it should be raw string to avoid escaping): re.sub('(a)b', r'\1d', 'abc')
  3. Similar to previous option: provide a callback function as repl argument and make it process the Match object and return required result.
  4. Use lookbehinds/lookaheds, which are not included in the match, but affect matching: re.sub('(?<=a)b', r'd', 'abxb') yields adxb. The ?<= in the beginning of the group says "it's a lookahead".



回答2:


import re

pattern = re.compile(r"I am (\d{1,2}) .*", re.IGNORECASE)

text = "i am 32 years old"

if re.match(pattern, text):
    print(
        re.sub(pattern, r"Your are \1 years old.", text, count=1)
    )

As above, first we compile a regex pattern with case insensitive flag.

Then we check if the text matches the pattern, if it does, we reference the only group in the regex pattern (age) with group number \1.




回答3:


Because that's exactly what re.sub() doc tells you it's supposed to do:

  • the pattern 'a(b)' says "match 'a', with optional trailing 'b'". (It could match 'a' on its own, but there is no way it could ever match 'b' on its own as you seem to expect. If you meant that, use a non-greedy (a)??b).
  • the replacement-string is 'd'
  • hence on your string 'abc', it matches all of 'ab' and replaces it with 'd', thus result is 'dc'

If you want your desired output, you'd need a non-greedy match on '(a)??':

>>> re.sub('(a)??b','d','abc')
'dc'


来源:https://stackoverflow.com/questions/42104540/why-does-re-sub-replace-the-entire-pattern-not-just-a-capturing-group-within-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!