Why is re.findall not being specific in finding triplet items in string. Python

后端 未结 4 1016
故里飘歌
故里飘歌 2021-01-17 03:49

So I have four lines of code

seq= \'ATGGAAGTTGGATGAAAGTGGAGGTAAAGAGAAGACGTTTGA\'



OR_0 = re.findall(r\'ATG(?:...){9,}?(?:TAA|TAG|TGA)\',seq)  
4条回答
  •  北荒
    北荒 (楼主)
    2021-01-17 04:08

    If the length is not a requirement then it's pretty easy:

    >>> import re
    >>> seq= 'ATGGAAGTTGGATGAAAGTGGAGGTAAAGAGAAGACGTTTGA'
    >>> regex = re.compile(r'ATG(?:...)*?(?:TAA|TAG|TGA)')
    >>> regex.findall(seq)
    ['ATGGAAGTTGGATGA']
    

    Anyway I believe, according to your explanation, that your previous regex is actually doing what you want: searching for matches of at least 30 characters that start in ATG and end in TGA.

    In your question you first state that you need matches of at least 30 characters, and hence you put the {9,}?, but after that you expect to match any match. You cannot have both, choose one. If length is important than keep the regex you already have and the result you are getting is correct.

提交回复
热议问题