I\'m trying to create the following regular expression: return a string between AUG
and (UAG
or UGA
or UAA
) from a follow
If you don't think in terms of 'matches', but rather in terms of 'intervals' I think you will find it easier. This is what @ionut-hulub has done. You can do it in a single pass as I demonstrate below, however you should probably use the simpler finditer() approach unless you have enough RNA strings (or they are long enough) you need to avoid the redundant passes over the string.
s = 'AGCCAUGUAGCUAACUCAGGUUACAUGGGGAUGACCCCGCGACUUGGAUUAGAGUCUCUUUUGGAAUAAGCCUGAAUGAUCCGAGUAGCAUCUCAG'
def intervals(s):
state = []
i = 0
max = len(s) - 2
while i < max:
if s[i] == 'A' and s[i+1] == 'U' and s[i+2] == 'G':
state.append(i)
if s[i] == 'U' and (s[i+1] == 'A' and s[i+2] == 'G') or (s[i+1] == 'G' and s[i+2] == 'A') or (s[i+1] == 'A' and s[i+2] == 'A'):
for b in state:
yield (b, i)
i += 1
for interval in intervals(s):
print interval