Scan Reading frame [3] Python

拟墨画扇 提交于 2019-12-06 15:24:42
sequence = 'TCATGAGGCTTTGGTAAATAT'

frame1 = sequence.find('ATG')

my_list = []

for codon in range(len(sequence)):
    next_codon = sequence[frame1:frame1+3]
    my_list.append(next_codon)
    frame1 +=3
    if next_codon == 'TAA':
        break

print my_list

['ATG', 'AGG', 'CTT', 'TGG', 'TAA']

You could start by decomposing your sequence into a series of 3-element frames

sequence = 'TCATGAGGCTTTGGTAAATAT'
frames = [sequence[i:i+3] for i in range(0,len(sequence),3)]
print "Frames:",frames
frames_before_ATG,frames_after_ATG = frames[:frames.index("ATG")],frames[frames.index("ATG")+1:]

Then iterate on the frames list until you find the first pattern.

To find the first position of ATG in sequence, the easier is by far:

>>> sequence.find('ATG')

In your example, that gives 2, the index of the pattern position. Then, just look for the second pattern after that position:

>>> idx_1 = sequence.find('ATG')
>>> idx_2 = sequence[idx_1:].find('TTA')

(the sequence[idx_1:] returns the elements of sequence after position idx_1).

Keep in mind that idx_2 is offset by idx_1 (that is, the actual position of pattern 2 in the original list is idx_2+idx_1. Note that if a pattern cannot be found, the .find method will return -1. You may want to add some test to deal with that case.

Once you found the two patterns, you can construct the list of intermediaries as:

>>> subsequence = sequence[idx_1:idx_2+idx_1]
>>> [subsequence[i:i+3] for i in range(0, len(subsequence), 3)]

You could easily iterate over a list of patterns following that example.

You may want to check whether idx_1%3 == 0, that is if idx_1 is a multiple of three (assuming that the first frame starts at 0). If not, at least you know that the beginning of your sequence is to be discarded.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!