I am looking for an efficient way to extract the shortest repeating substring. For example:
input1 = \'dabcdbcdbcdd\'
ouput1 = \'bcd\'
input2 = \'cbabababac
^ matches at the start of a string. In your example the repeating substrings don't start at the beginning. Similar for $. Without ^ and $ the pattern .*? always matches empty string. Demo:
import re
def srp(s):
return re.search(r'(.+?)\1+', s).group(1)
print srp('dabcdbcdbcdd') # -> bcd
print srp('cbabababac') # -> ba
Though It doesn't find the shortest substring.