Why is re.findall not being specific in finding triplet items in string. Python

后端 未结 4 1017
故里飘歌
故里飘歌 2021-01-17 03:49

So I have four lines of code

seq= \'ATGGAAGTTGGATGAAAGTGGAGGTAAAGAGAAGACGTTTGA\'



OR_0 = re.findall(r\'ATG(?:...){9,}?(?:TAA|TAG|TGA)\',seq)  
4条回答
  •  没有蜡笔的小新
    2021-01-17 04:25

    If you want your regex to stop matching at the first TAA|TAG|TGA, but still only succeed if there are at least nine three letter chunks, the following may help:

    >>> import re
    >>> regexp = r'ATG(?:(?!TAA|TAG|TGA)...){9,}?(?:TAA|TAG|TGA)'
    >>> re.findall(regexp, 'ATGAAAAAAAAAAAAAAAAAAAAAAAAAAATAG')
    ['ATGAAAAAAAAAAAAAAAAAAAAAAAAAAATAG']
    >>> re.findall(regexp, 'ATGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAG')
    ['ATGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATAG']
    >>> re.findall(regexp, 'ATGAAATAGAAAAAAAAAAAAAAAAAAAAATAG')
    []
    

    This uses a negative lookahead (?!TAA|TAG|TGA) to ensure that a three character chunk is not a TAA|TAG|TGA before it matches the three character chunk.

    Note though that a TAA|TAG|TGA that does not fall on a three character boundary will still successfully match:

    >>> re.findall(regexp, 'ATGAAAATAGAAAAAAAAAAAAAAAAAAAATAG')
    ['ATGAAAATAGAAAAAAAAAAAAAAAAAAAATAG']
    

提交回复
热议问题