Why python regular expression (?P=name) doesn't work

℡╲_俬逩灬. 提交于 2021-02-02 09:27:06

问题


I am learning 're' part of Python, and the named pattern (?P=name) confused me,

When I using re.sub() to make some exchange for digit and character, the patter '(?P=name)' doesn't work, but the pattern '\N' and '\g<name>' still make sense. Code below:

[IN]print(re.sub(r'(?P<digit>\d{3})-(?P<char>\w{4})', r'(?P=char)-(?P=digit)', '123-abcd'))
[OUT] (?P=char)-(?P=digit)
[IN] print(re.sub(r'(?P<digit>\d{3})-(?P<char>\w{4})', r'\2-\1', '123-abcd'))
[OUT] abcd-123
[IN] print(re.sub(r'(?P<digit>\d{3})-(?P<char>\w{4})', r'\g<char>-\g<digit>', '123-abcd'))
[OUT] abcd-123

Why it failed to make substitute when I use (?P=name)?
And how to use it correctly?
I am using Python 3.5


回答1:


The (?P=name) is an inline (in-pattern) backreference. You may use it inside a regular expression pattern to match the same content as is captured by the corresponding named capturing group, see the Python Regular Expression Syntax reference:

(?P=name)
A backreference to a named group; it matches whatever text was matched by the earlier group named name.

See this demo: (?P<digit>\d{3})-(?P<char>\w{4})&(?P=char)-(?P=digit) matches 123-abcd&abcd-123 because the "digit" group matches and captures 123, "char" group captures abcd and then the named inline backreferences match abcd and 123.

To replace matches, use \1, \g<1> or \g<char> syntax with re.sub replacement pattern. Do not use (?P=name) for that purpose:

repl can be a string or a function... Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern...

In string-type repl arguments, in addition to the character escapes and backreferences described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE.




回答2:


You can check the details of using and back-referencing ?P visiting:

https://docs.python.org/3/library/re.html

and using CTRL+F in your browser to look for (?P...). It comes a nice chart with all the instructions about when you can make use of ?P=name.

For this example, you're doing right at your third re.sub() call.

In the all re.sub() calls you can only use the ?P=name syntax in the first string parameter of this method and you don't need it in the second string parameter because you have the \g syntax.

In case you're confuse about the ?P=name being useful, it is, but for making a match by backreferencing an already named string.

Example: you want to match potatoXXXpotato and replace it for YYXXXYY. You could make:

re.sub(r'(?P<myName>potato)(XXX)(?P=myName)', r'YY\2YY', 'potatoXXXpotato')

or

re.sub(r'(?P<myName>potato)(?P<triple>XXX)(?P=myName)', r'YY\g<triple>YY', 'potatoXXXpotato')


来源:https://stackoverflow.com/questions/46021148/why-python-regular-expression-p-name-doesnt-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!