I want to calculate the frequency of occurrence of each letter in all columns: for example I have this three sequences :
seq1=AATC
seq2=GCCT
seq3=ATCA
here, we have: in the first column frequency of 'A' is 2 , 'G' is 1 . for the second column : the frequency of 'A' is 1, 'C' is 1 and 'T' is 1. (the same thing in the rest of column) first, I try to do the code of calculating frequency I try this:
for example:
s='AATC'
dic={}
for x in s:
dic[x]=s.count(x)
this gives: {'A':2,'T':1,'C':1}
now, I want to apply this on columns.for that I use this instruction:
f=list(zip(seq1,seq2,seq3))
gives:
[('A', 'G', 'A'), ('A', 'C', 'T'), ('T', 'C', 'C'), ('C', 'T', 'A')]
So, here, I calculate the frequency of letters in (): How can I do this ?
if I work on a file of sequences, how can I use this code to apply it on the sequences of file? for example my file contains 100 sequences each time I take three sequences and apply this code.
As with my answer to your last question, you should wrap your functionality in a function:
def lettercount(pos):
return {c: pos.count(c) for c in pos}
Then you can easily apply it to the tuples from zip
:
counts = [lettercount(t) for t in zip(seq1, seq2, seq3)]
Or combine it into the existing loop:
...
counts = []
for position in zip(seq1, seq2, seq3): # sets at same position
counts.append(lettercount(position))
for pair in combinations(position, 2): # pairs within set
...
Here:
sequences = ['AATC',
'GCCT',
'ATCA']
f = zip(*sequences)
counts = [{letter: column.count(letter) for letter in column} for column in f]
print(counts)
Output (reformatted):
[{'A': 2, 'G': 1},
{'A': 1, 'C': 1, 'T': 1},
{'C': 2, 'T': 1},
{'A': 1, 'C': 1, 'T': 1}]
Salient features:
- Rather than explicitly naming
seq1
,seq2
, etc., we put them into a list. - We unpack the list with the
*
operator. - We use a dict comprehension inside a list comprehension to generate the counts for each letter in each column. It's basically what you did for the one-sequence case, but more readable (IMO).
来源:https://stackoverflow.com/questions/21103320/frequency-of-letters-in-column-python