How to get all overlapping matches in python regex that may start at the same location in a string?

前端未结

关注

 2  1462

栀梦

How do I get all possible overlapping matches in a string in Python with multiple starting and ending points.

I\'ve tried using regex module, instead of default re m

相关标签:

2条回答

旧时难觅i

2020-12-20 00:39

Regex are not the proper tool here, I would recommend:

Identify all the indexes of the first letter in the input string
Identify all the indexes of the second letter in the input string
Build all the substrings based on those indexes

code:

def find(str, ch):
    for i, ltr in enumerate(str):
        if ltr == ch:
            yield i

s = "axaybzb"
startChar = 'a'
endChar = 'b'

startCharList = list(find(s,startChar))
endCharList = list(find(s,endChar))

output = []
for u in startCharList:
    for v in endCharList:
           if u <= v:
               output.append(s[u:v+1])
print(output)

output:

$ python substring.py 
['axayb', 'axaybzb', 'ayb', 'aybzb']

0 讨论(0)

孤城傲影

2020-12-20 00:51

With simple patterns like yours, you may generate slices of all consecutive chars in a string and test them all against a specific regex for a full match:

import re

def findall_overlapped(r, s):
  res = []                     # Resulting list
  reg = r'^{}$'.format(r)      # Regex must match full string
  for q in range(len(s)):      # Iterate over all chars in a string
    for w in range(q,len(s)):  # Iterate over the rest of the chars to the right
        cur = s[q:w+1]         # Currently tested slice
        if re.match(reg, cur): # If there is a full slice match
            res.append(cur)    # Append it to the resulting list
  return res

rex = r'a\w+b'
print(findall_overlapped(rex, 'axaybzb'))
# => ['axayb', 'axaybzb', 'ayb', 'aybzb']

See the Python demo

WARNING: Note this won't work if you have a pattern checking left- or right-hand contexts, with lookaheads or lookbehinds on either end of the pattern since this context will be lost when iterating over the string.

0 讨论(0)