Scan Reading frame [3] Python

蓝咒 提交于 2019-12-22 12:26:36

问题


I'm trying to write a script that can use a reading frame of 3 to detect a certain pattern and then from that sequence, go in multiples of 3 to find another pattern

sequence = 'TCATGAGGCTTTGGTAAATAT'

i need it to:

...scan with a reading frame of 3 until it finds a desired pattern (i.e. 'ATG')

...mark the location of where the first pattern ('ATG') started in the original sequence and the position of where the second pattern started ('TAA'). In this case, it would be position 3 for 'ATG' and 15 for 'TAA' .

...create a list with each triplet that follows the first pattern until it reaches the second pattern 'TAA' (i.e. 'ATG','AGG','CTT',TGG','TAA')

How do I construct a reading frame to read it in sets of 3 ? I know that once i find a way to get the reading i can create an if statement saying

reading_frame=[]

for frame in sequence:
    if k == 'ATG':
        reading_frame.append(k)

first i need the reading frame


回答1:


sequence = 'TCATGAGGCTTTGGTAAATAT'

frame1 = sequence.find('ATG')

my_list = []

for codon in range(len(sequence)):
    next_codon = sequence[frame1:frame1+3]
    my_list.append(next_codon)
    frame1 +=3
    if next_codon == 'TAA':
        break

print my_list

['ATG', 'AGG', 'CTT', 'TGG', 'TAA']




回答2:


You could start by decomposing your sequence into a series of 3-element frames

sequence = 'TCATGAGGCTTTGGTAAATAT'
frames = [sequence[i:i+3] for i in range(0,len(sequence),3)]
print "Frames:",frames
frames_before_ATG,frames_after_ATG = frames[:frames.index("ATG")],frames[frames.index("ATG")+1:]

Then iterate on the frames list until you find the first pattern.




回答3:


To find the first position of ATG in sequence, the easier is by far:

>>> sequence.find('ATG')

In your example, that gives 2, the index of the pattern position. Then, just look for the second pattern after that position:

>>> idx_1 = sequence.find('ATG')
>>> idx_2 = sequence[idx_1:].find('TTA')

(the sequence[idx_1:] returns the elements of sequence after position idx_1).

Keep in mind that idx_2 is offset by idx_1 (that is, the actual position of pattern 2 in the original list is idx_2+idx_1. Note that if a pattern cannot be found, the .find method will return -1. You may want to add some test to deal with that case.

Once you found the two patterns, you can construct the list of intermediaries as:

>>> subsequence = sequence[idx_1:idx_2+idx_1]
>>> [subsequence[i:i+3] for i in range(0, len(subsequence), 3)]

You could easily iterate over a list of patterns following that example.

You may want to check whether idx_1%3 == 0, that is if idx_1 is a multiple of three (assuming that the first frame starts at 0). If not, at least you know that the beginning of your sequence is to be discarded.



来源:https://stackoverflow.com/questions/12521650/scan-reading-frame-3-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!