Finding the most frequent character in a string

前端 未结 10 1907
执笔经年
执笔经年 2020-12-03 21:54

I found this programming problem while looking at a job posting on SO. I thought it was pretty interesting and as a beginner Python programmer I attempted to tackle it. Howe

10条回答
  •  旧巷少年郎
    2020-12-03 22:17

    If you want to have all the characters with the maximum number of counts, then you can do a variation on one of the two ideas proposed so far:

    import heapq  # Helps finding the n largest counts
    import collections
    
    def find_max_counts(sequence):
        """
        Returns an iterator that produces the (element, count)s with the
        highest number of occurrences in the given sequence.
    
        In addition, the elements are sorted.
        """
    
        if len(sequence) == 0:
            raise StopIteration
    
        counter = collections.defaultdict(int)
        for elmt in sequence:
            counter[elmt] += 1
    
        counts_heap = [
            (-count, elmt)  # The largest elmt counts are the smallest elmts
            for (elmt, count) in counter.iteritems()]
    
        heapq.heapify(counts_heap)
    
        highest_count = counts_heap[0][0]
    
        while True:
    
            try:
                (opp_count, elmt) = heapq.heappop(counts_heap)
            except IndexError:
                raise StopIteration
    
            if opp_count != highest_count:
                raise StopIteration
    
            yield (elmt, -opp_count)
    
    for (letter, count) in find_max_counts('balloon'):
        print (letter, count)
    
    for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
        print (word, count)
    

    This yields, for instance:

    lebigot@weinberg /tmp % python count.py
    ('l', 2)
    ('o', 2)
    ('he', 2)
    ('ll', 2)
    

    This works with any sequence: words, but also ['hello', 'hello', 'bonjour'], for instance.

    The heapq structure is very efficient at finding the smallest elements of a sequence without sorting it completely. On the other hand, since there are not so many letter in the alphabet, you can probably also run through the sorted list of counts until the maximum count is not found anymore, without this incurring any serious speed loss.

提交回复
热议问题