I have a string that is randomly generated:
polymer_str = \"diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine\"
I\'d
Expanding on Ealdwulf's answer:
Documentation on re.findall can be found here.
def getLongestSequenceSize(search_str, polymer_str):
matches = re.findall(r'(?:\b%s\b\s?)+' % search_str, polymer_str)
longest_match = max(matches)
return longest_match.count(search_str)
This could be written as one line, but it becomes less readable in that form.
Alternative:
If polymer_str is huge, it will be more memory efficient to use re.finditer. Here's how you might go about it:
def getLongestSequenceSize(search_str, polymer_str):
longest_match = ''
for match in re.finditer(r'(?:\b%s\b\s?)+' % search_str, polymer_str):
if len(match.group(0)) > len(longest_match):
longest_match = match.group(0)
return longest_match.count(search_str)
The biggest difference between findall and finditer is that the first returns a list object, while the second iterates over Match objects. Also, the finditer approach will be somewhat slower.