Finding the most frequent character in a string

北战南征 提交于 2019-12-28 06:46:37

问题


I found this programming problem while looking at a job posting on SO. I thought it was pretty interesting and as a beginner Python programmer I attempted to tackle it. However I feel my solution is quite...messy...can anyone make any suggestions to optimize it or make it cleaner? I know it's pretty trivial, but I had fun writing it. Note: Python 2.6

The problem:

Write pseudo-code (or actual code) for a function that takes in a string and returns the letter that appears the most in that string.

My attempt:

import string

def find_max_letter_count(word):

    alphabet = string.ascii_lowercase
    dictionary = {}

    for letters in alphabet:
        dictionary[letters] = 0

    for letters in word:
        dictionary[letters] += 1

    dictionary = sorted(dictionary.items(), 
                        reverse=True, 
                        key=lambda x: x[1])

    for position in range(0, 26):
        print dictionary[position]
        if position != len(dictionary) - 1:
            if dictionary[position + 1][1] < dictionary[position][1]:
                break

find_max_letter_count("helloworld")

Output:

>>> 
('l', 3)

Updated example:

find_max_letter_count("balloon") 
>>>
('l', 2)
('o', 2)

回答1:


There are many ways to do this shorter. For example, you can use the Counter class (in Python 2.7 or later):

import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])

If you don't have that, you can do the tally manually (2.5 or later has defaultdict):

d = collections.defaultdict(int)
for c in s:
    d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])

Having said that, there's nothing too terribly wrong with your implementation.




回答2:


If you are using Python 2.7, you can quickly do this by using collections module. collections is a hight performance data structures module. Read more at http://docs.python.org/library/collections.html#counter-objects

>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2



回答3:


Here is way to find the most common character using a dictionary

message = "hello world"
d = {}
letters = set(message)
for l in letters:
    d[message.count(l)] = l

print d[d.keys()[-1]], d.keys()[-1]



回答4:


If you want to have all the characters with the maximum number of counts, then you can do a variation on one of the two ideas proposed so far:

import heapq  # Helps finding the n largest counts
import collections

def find_max_counts(sequence):
    """
    Returns an iterator that produces the (element, count)s with the
    highest number of occurrences in the given sequence.

    In addition, the elements are sorted.
    """

    if len(sequence) == 0:
        raise StopIteration

    counter = collections.defaultdict(int)
    for elmt in sequence:
        counter[elmt] += 1

    counts_heap = [
        (-count, elmt)  # The largest elmt counts are the smallest elmts
        for (elmt, count) in counter.iteritems()]

    heapq.heapify(counts_heap)

    highest_count = counts_heap[0][0]

    while True:

        try:
            (opp_count, elmt) = heapq.heappop(counts_heap)
        except IndexError:
            raise StopIteration

        if opp_count != highest_count:
            raise StopIteration

        yield (elmt, -opp_count)

for (letter, count) in find_max_counts('balloon'):
    print (letter, count)

for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
    print (word, count)

This yields, for instance:

lebigot@weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)

This works with any sequence: words, but also ['hello', 'hello', 'bonjour'], for instance.

The heapq structure is very efficient at finding the smallest elements of a sequence without sorting it completely. On the other hand, since there are not so many letter in the alphabet, you can probably also run through the sorted list of counts until the maximum count is not found anymore, without this incurring any serious speed loss.




回答5:


Question : Most frequent character in a string The maximum occurring character in an input string

Method 1 :

a = "GiniGinaProtijayi"

d ={}
chh = ''
max = 0 
for ch in a : d[ch] = d.get(ch,0) +1 
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
    chh = ch
    max  = d.get(ch)


print(chh)  
print(max)  

Method 2 :

a = "GiniGinaProtijayi"

max = 0 
chh = ''
count = [0] * 256 
for ch in a : count[ord(ch)] += 1
for ch in a :
    if(count[ord(ch)] > max):
        max = count[ord(ch)] 
        chh = ch

print(chh)        

Method 3 :

import collections

a = "GiniGinaProtijayi"

aa = collections.Counter(a).most_common(1)[0]
print(aa)



回答6:


Here are a few things I'd do:

  • Use collections.defaultdict instead of the dict you initialise manually.
  • Use inbuilt sorting and max functions like max instead of working it out yourself - it's easier.

Here's my final result:

from collections import defaultdict

def find_max_letter_count(word):
    matches = defaultdict(int)  # makes the default value 0

    for char in word:
        matches[char] += 1

    return max(matches.iteritems(), key=lambda x: x[1])

find_max_letter_count('helloworld') == ('l', 3)



回答7:


def most_frequent(text):
    frequencies = [(c, text.count(c)) for c in set(text)]
    return max(frequencies, key=lambda x: x[1])[0]

s = 'ABBCCCDDDD'
print(most_frequent(s))

frequencies is a list of tuples that count the characters as (character, count). We apply max to the tuples using count's and return that tuple's character. In the event of a tie, this solution will pick only one.




回答8:


I noticed that most of the answers only come back with one item even if there is an equal amount of characters most commonly used. For example "iii 444 yyy 999". There are an equal amount of spaces, i's, 4's, y's, and 9's. The solution should come back with everything, not just the letter i:

sentence = "iii 444 yyy 999"

# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]

# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
                         for x, y in Counter(sentence).items() if y == largest_count]

print(most_common_count)

# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]



回答9:


#file:filename
#quant:no of frequent words you want

def frequent_letters(file,quant):
    file = open(file)
    file = file.read()
    cnt = Counter
    op = cnt(file).most_common(quant)
    return op   


来源:https://stackoverflow.com/questions/4131123/finding-the-most-frequent-character-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!