lookahead and non-capturing regular expressions

♀尐吖头ヾ 提交于 2019-12-19 09:58:06

问题


I'm trying to match the local part of an email address before the @ character with:

LOCAL_RE_NOTQUOTED = """
((
\w         # alphanumeric and _
| [!#$%&'*+-/=?^_`{|}~]          # special chars, but no dot at beginning
)
(
\w         # alphanumeric and _
| [!#$%&'*+-/=?^_`{|}~]          # special characters
| ([.](?![.])) # negative lookahead to avoid pairs of dots. 
)*)
(?<!\.)(?:@)           # no end with dot before @
"""

Testing with:

re.match(LOCAL_RE_NOTQUOTED, "a.a..a@", re.VERBOSE).group()

gives:

'a.a..a@'

Why is the @ printed in the output, even though I'm using a non-capturing group (?:@)?

Testing with:

 re.match(LOCAL_RE_NOTQUOTED, "a.a..a@", re.VERBOSE).groups()

gives:

('a.a..a', 'a', 'a', None)

Why does the regex not reject the string with a pair of dots '..'?


回答1:


You're confusing non-capturing groups (?:...) and lookahead assertions (?=...).

The former do participate in the match (and are thus part of match.group() which contains the overall match), they just don't generate a backreference ($1 etc. for later use).

The second problem (Why is the double dot matched?) is a bit trickier. This is because of an error in your regex. You see, when you wrote (shortened to make the point)

[+-/]

you wrote "Match a character between + and /, and in ASCII, the dot is right between them (ASCII 43-47: +,-./). Therefore, the first character class matches the dot, and the lookahead assertion is never reached. You need to place the dash at the end of the character class to treat it as a literal dash:

((
\w         # alphanumeric and _
| [!#$%&'*+/=?^_`{|}~-]          # special chars, but no dot at beginning
)
(
\w         # alphanumeric and _
| [!#$%&'*+/=?^_`{|}~-]          # special characters
| ([.](?![.])) # negative lookahead to avoid pairs of dots. 
)*)
(?<!\.)(?=@)           # no end with dot before @

And of course, if you want to use this logic, you can streamline it a bit:

^(?!\.)                   # no dot at the beginning
(?:
[\w!#$%&'*+/=?^_`{|}~-]   # alnums or special characters except dot
| (\.(?![.@]))            # or dot unless it's before a dot or @ 
)*
(?=@)                     # end before @


来源:https://stackoverflow.com/questions/7040829/lookahead-and-non-capturing-regular-expressions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!