Counting multiple letter groups in a string

后端 未结 2 1532
北荒
北荒 2020-12-21 19:28

I\'ve been trying to adapt my python function to count groups of letters instead of single letters and I\'m having a bit of trouble. Here\'s the code I have to count individ

相关标签:
2条回答
  • 2020-12-21 20:17

    This can be done pretty quickly using collections.Counter.

    from collections import Counter
    
    s = "CTAACAAC"
    
    def chunk_string(s, n):
        return [s[i:i+n] for i in range(len(s)-n+1)]
    
    counter = Counter(chunk_string(s, 3))
    # Counter({'AAC': 2, 'ACA': 1, 'CAA': 1, 'CTA': 1, 'TAA': 1})
    

    Edit: To elaborate on chunk_string:

    It takes a string s and a chunk size n as arguments. Each s[i:i+n] is a slice of the string that is n characters long. The loop iterates over the valid indices where the string can be sliced (0 to len(s)-n). All of these slices are then grouped in a list comprehension. An equivalent method is:

    def chunk_string(s, n):
        chunks = []
        last_index = len(s) - n
        for i in range(0, last_index + 1):
            chunks.append(s[i:i+n])
        return chunks
    
    0 讨论(0)
  • 2020-12-21 20:25

    This is basically as the first posted answer by Jared Goguen, but in reply to OP's comment, for a possible way without importing a module:

    >>> m
    'CTAAAGTCAACCTTCGGTTGACCTTGAGGGTTCCCTAAGGGTTGGGGATGACCCTTGGGTCTAAAGTCAACCTTCGGTTGACCTTGAGGGTTCCCTAAGGGTT'
    >>> l = [m[i:i+3] for i in range(len(m)-2)]
    >>> 
    >>> d = {}
    >>> 
    >>> for k in set(l):
            d[k] = l.count(k)
    
    
    >>> d
    {'AAG': 4, 'GGA': 1, 'AAA': 2, 'TAA': 4, 'AGG': 4, 'AGT': 2, 'GGG': 7, 'ACC': 5, 'CGG': 2, 'GGT': 7, 'TCC': 2, 'TGA': 5, 'CAA': 2, 'TGG': 2, 'GTC': 3, 'AAC': 2, 'ATG': 1, 'CTT': 5, 'TCA': 2, 'CCT': 7, 'CCC': 3, 'GTT': 6, 'TTG': 6, 'GAT': 1, 'GAC': 3, 'TCG': 2, 'GAG': 2, 'CTA': 4, 'TTC': 4, 'TCT': 1}
    

    Or if you are a fan of one liners:

    >>> d = {k:l.count(k) for k in set(l)}
    
    0 讨论(0)
提交回复
热议问题