Capturing repeating subpatterns in Python regex

前端未结

关注

 4  1144

While matching an email address, after I match something like yasar@webmail, I want to capture one or more of (\\.\\w+)(what I am doing is a little

相关标签:

4条回答

你的背包

2020-11-22 05:58

This will work:

>>> regexp = r"[\w\.]+@(\w+)(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?"
>>> email_address = "william.adama@galactica.caprica.fleet.mil"
>>> m = re.match(regexp, email_address)
>>> m.groups()
('galactica', '.caprica', '.fleet', '.mil', None, None)

But it's limited to a maximum of six subgroups. A better way to do this would be:

>>> m = re.match(r"[\w\.]+@(.+)", email_address)
>>> m.groups()
('galactica.caprica.fleet.mil',)
>>> m.group(1).split('.')
['galactica', 'caprica', 'fleet', 'mil']

Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

0 讨论(0)

余生分开走

2020-11-22 06:00

You can fix the problem of (\.\w+)+ only capturing the last match by doing this instead: ((?:\.\w+)+)

0 讨论(0)
发布评论:

提交评论
- 加载中...

被撕碎了的回忆

2020-11-22 06:10

This is what you are looking for:

>>> import re

>>> s="yasar@webmail.something.edu.tr"
>>> r=re.compile("\.\w+")
>>> m=r.findall(s)

>>> m
['.something', '.edu', '.tr']

0 讨论(0)

梦毁少年i

2020-11-22 06:21
re module doesn't support repeated captures (regex supports it):
```
>>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']
```
In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.
0 讨论(0)
发布评论:

提交评论
- 加载中...