问题
I'm trying to write a script that can use a reading frame of 3 to detect a certain pattern and then from that sequence, go in multiples of 3 to find another pattern
sequence = 'TCATGAGGCTTTGGTAAATAT'
i need it to:
...scan with a reading frame of 3 until it finds a desired pattern (i.e. 'ATG')
...mark the location of where the first pattern ('ATG') started in the original sequence and the position of where the second pattern started ('TAA'). In this case, it would be position 3 for 'ATG' and 15 for 'TAA' .
...create a list with each triplet that follows the first pattern until it reaches the second pattern 'TAA' (i.e. 'ATG','AGG','CTT',TGG','TAA')
How do I construct a reading frame to read it in sets of 3 ? I know that once i find a way to get the reading i can create an if statement saying
reading_frame=[]
for frame in sequence:
if k == 'ATG':
reading_frame.append(k)
first i need the reading frame
回答1:
sequence = 'TCATGAGGCTTTGGTAAATAT'
frame1 = sequence.find('ATG')
my_list = []
for codon in range(len(sequence)):
next_codon = sequence[frame1:frame1+3]
my_list.append(next_codon)
frame1 +=3
if next_codon == 'TAA':
break
print my_list
['ATG', 'AGG', 'CTT', 'TGG', 'TAA']
回答2:
You could start by decomposing your sequence
into a series of 3-element frames
sequence = 'TCATGAGGCTTTGGTAAATAT'
frames = [sequence[i:i+3] for i in range(0,len(sequence),3)]
print "Frames:",frames
frames_before_ATG,frames_after_ATG = frames[:frames.index("ATG")],frames[frames.index("ATG")+1:]
Then iterate on the frames
list until you find the first pattern.
回答3:
To find the first position of ATG
in sequence
, the easier is by far:
>>> sequence.find('ATG')
In your example, that gives 2
, the index of the pattern position. Then, just look for the second pattern after that position:
>>> idx_1 = sequence.find('ATG')
>>> idx_2 = sequence[idx_1:].find('TTA')
(the sequence[idx_1:]
returns the elements of sequence
after position idx_1
).
Keep in mind that idx_2
is offset by idx_1
(that is, the actual position of pattern 2 in the original list is idx_2+idx_1
. Note that if a pattern cannot be found, the .find
method will return -1. You may want to add some test to deal with that case.
Once you found the two patterns, you can construct the list of intermediaries as:
>>> subsequence = sequence[idx_1:idx_2+idx_1]
>>> [subsequence[i:i+3] for i in range(0, len(subsequence), 3)]
You could easily iterate over a list of patterns following that example.
You may want to check whether idx_1%3 == 0
, that is if idx_1
is a multiple of three (assuming that the first frame starts at 0). If not, at least you know that the beginning of your sequence is to be discarded.
来源:https://stackoverflow.com/questions/12521650/scan-reading-frame-3-python