How to get all overlapping matches in python regex that may start at the same location in a string?

前端 未结 2 1455
栀梦
栀梦 2020-12-20 00:06

How do I get all possible overlapping matches in a string in Python with multiple starting and ending points.

I\'ve tried using regex module, instead of default re m

相关标签:
2条回答
  • 2020-12-20 00:39

    Regex are not the proper tool here, I would recommend:

    • Identify all the indexes of the first letter in the input string
    • Identify all the indexes of the second letter in the input string
    • Build all the substrings based on those indexes

    code:

    def find(str, ch):
        for i, ltr in enumerate(str):
            if ltr == ch:
                yield i
    
    s = "axaybzb"
    startChar = 'a'
    endChar = 'b'
    
    startCharList = list(find(s,startChar))
    endCharList = list(find(s,endChar))
    
    output = []
    for u in startCharList:
        for v in endCharList:
               if u <= v:
                   output.append(s[u:v+1])
    print(output)
    

    output:

    $ python substring.py 
    ['axayb', 'axaybzb', 'ayb', 'aybzb']
    
    0 讨论(0)
  • 2020-12-20 00:51

    With simple patterns like yours, you may generate slices of all consecutive chars in a string and test them all against a specific regex for a full match:

    import re
    
    def findall_overlapped(r, s):
      res = []                     # Resulting list
      reg = r'^{}$'.format(r)      # Regex must match full string
      for q in range(len(s)):      # Iterate over all chars in a string
        for w in range(q,len(s)):  # Iterate over the rest of the chars to the right
            cur = s[q:w+1]         # Currently tested slice
            if re.match(reg, cur): # If there is a full slice match
                res.append(cur)    # Append it to the resulting list
      return res
    
    rex = r'a\w+b'
    print(findall_overlapped(rex, 'axaybzb'))
    # => ['axayb', 'axaybzb', 'ayb', 'aybzb']
    

    See the Python demo

    WARNING: Note this won't work if you have a pattern checking left- or right-hand contexts, with lookaheads or lookbehinds on either end of the pattern since this context will be lost when iterating over the string.

    0 讨论(0)
提交回复
热议问题