Capturing repeating subpatterns in Python regex

前端 未结 4 1056
难免孤独
难免孤独 2020-11-22 05:21

While matching an email address, after I match something like yasar@webmail, I want to capture one or more of (\\.\\w+)(what I am doing is a little

相关标签:
4条回答
  • 2020-11-22 05:58

    This will work:

    >>> regexp = r"[\w\.]+@(\w+)(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?"
    >>> email_address = "william.adama@galactica.caprica.fleet.mil"
    >>> m = re.match(regexp, email_address)
    >>> m.groups()
    ('galactica', '.caprica', '.fleet', '.mil', None, None)
    

    But it's limited to a maximum of six subgroups. A better way to do this would be:

    >>> m = re.match(r"[\w\.]+@(.+)", email_address)
    >>> m.groups()
    ('galactica.caprica.fleet.mil',)
    >>> m.group(1).split('.')
    ['galactica', 'caprica', 'fleet', 'mil']
    

    Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.

    0 讨论(0)
  • 2020-11-22 06:00

    You can fix the problem of (\.\w+)+ only capturing the last match by doing this instead: ((?:\.\w+)+)

    0 讨论(0)
  • 2020-11-22 06:10

    This is what you are looking for:

    >>> import re
    
    >>> s="yasar@webmail.something.edu.tr"
    >>> r=re.compile("\.\w+")
    >>> m=r.findall(s)
    
    >>> m
    ['.something', '.edu', '.tr']
    
    0 讨论(0)
  • 2020-11-22 06:21

    re module doesn't support repeated captures (regex supports it):

    >>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')
    >>> m.groups()
    ('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
    >>> m.captures(4)
    ['.something', '.edu', '.tr']
    

    In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

    0 讨论(0)
提交回复
热议问题