How do I get all possible overlapping matches in a string in Python with multiple starting and ending points.
I\'ve tried using regex module, instead of default re m
Regex are not the proper tool here, I would recommend:
code:
def find(str, ch):
for i, ltr in enumerate(str):
if ltr == ch:
yield i
s = "axaybzb"
startChar = 'a'
endChar = 'b'
startCharList = list(find(s,startChar))
endCharList = list(find(s,endChar))
output = []
for u in startCharList:
for v in endCharList:
if u <= v:
output.append(s[u:v+1])
print(output)
output:
$ python substring.py
['axayb', 'axaybzb', 'ayb', 'aybzb']
With simple patterns like yours, you may generate slices of all consecutive chars in a string and test them all against a specific regex for a full match:
import re
def findall_overlapped(r, s):
res = [] # Resulting list
reg = r'^{}$'.format(r) # Regex must match full string
for q in range(len(s)): # Iterate over all chars in a string
for w in range(q,len(s)): # Iterate over the rest of the chars to the right
cur = s[q:w+1] # Currently tested slice
if re.match(reg, cur): # If there is a full slice match
res.append(cur) # Append it to the resulting list
return res
rex = r'a\w+b'
print(findall_overlapped(rex, 'axaybzb'))
# => ['axayb', 'axaybzb', 'ayb', 'aybzb']
See the Python demo
WARNING: Note this won't work if you have a pattern checking left- or right-hand contexts, with lookaheads or lookbehinds on either end of the pattern since this context will be lost when iterating over the string.