Overlapping regex matches

后端 未结 3 2049
抹茶落季
抹茶落季 2020-12-21 19:20

I\'m trying to create the following regular expression: return a string between AUG and (UAG or UGA or UAA) from a follow

3条回答
  •  南方客
    南方客 (楼主)
    2020-12-21 19:56

    If you don't think in terms of 'matches', but rather in terms of 'intervals' I think you will find it easier. This is what @ionut-hulub has done. You can do it in a single pass as I demonstrate below, however you should probably use the simpler finditer() approach unless you have enough RNA strings (or they are long enough) you need to avoid the redundant passes over the string.

    s = 'AGCCAUGUAGCUAACUCAGGUUACAUGGGGAUGACCCCGCGACUUGGAUUAGAGUCUCUUUUGGAAUAAGCCUGAAUGAUCCGAGUAGCAUCUCAG'
    
    def intervals(s):
        state = []
        i = 0
        max = len(s) - 2
        while i < max:
            if s[i] == 'A' and s[i+1] == 'U' and s[i+2] == 'G':
                state.append(i)
            if s[i] == 'U' and (s[i+1] == 'A' and s[i+2] == 'G') or (s[i+1] == 'G' and s[i+2] == 'A') or (s[i+1] == 'A' and s[i+2] == 'A'):
                for b in state:
                    yield (b, i)
            i += 1
    
    for interval in intervals(s):
        print interval
    

提交回复
热议问题