问题
Can I accomplish a rank/sort using Counter.most_common() functionality, thus avoiding this line: d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False)
??
Challenge: You are given a string.The string contains only lowercase English alphabet characters.Your task is to find the top three most common characters in the string.
Output Format: Print the three most common characters along with their occurrence count each on a separate line. Sort output in descending order of occurrence count. If the occurrence count is the same, sort the characters in ascending order.
In completing this I used dict, Counter, and sort in order to ensure "the occurrence count is the same, sort the characters in ascending order". The in-built Python sorted
functionality ensures ordering by count, then alphabetical. I'm curious if there is a way to override Counter.most_common()
default arbitrary sort/order logic as it seems to disregard the lexicographical order of the results when picking the top 3.
import sys
from collections import Counter
string = sys.stdin.readline().strip()
d = dict(Counter(string).most_common(3))
d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False)
for letter, count in d[:3]:
print letter, count
回答1:
Yes the doc explicitly says Counter.most_common()'s (tie-breaker) order for when counts are equal is arbitrary.
- UPDATE: PM2Ring told me Counter inherits dict's ordering. The insertion order thing only happens in 3.6+, and is only guaranteed in 3.7. It's possible the doc is lagging.
- In cPython 3.6+ they fall back on original insertion order (see bottom), but don't rely on that implementation because per the spec, it's not defined behavior. Best to do your own sort, as you say, if you want totally deterministic behavior.
- I show at bottom how you can monkey-patch
Counter.most_common
with your own sort function like you show, but that's frowned on. (Code you write might accidentally rely on it and hence break when it wasn't patched.) - You could subclass
Counter
toMyCounter
so you can override itsmost_common
. Painful and not really portable. - Really the best approach is just to write code and tests that don't rely on the arbitrary tiebreaker order from
most_common()
- I agree that
most_common()
should not have been hardwired and we should be able to pass a comparison key or sort function into__init__()
.
Monkey-patching Counter.most_common()
:
def patched_most_common(self):
return sorted(self.items(), key=lambda x: (-x[1],x[0]))
collections.Counter.most_common = patched_most_common
collections.Counter('ccbaab')
Counter({'a': 2, 'b': 2, 'c': 2})
Demonstrating that in cPython 3.7, the arbitrary order is order of insertion (first insertion of each character):
Counter('abccba').most_common()
[('a', 2), ('b', 2), ('c', 2)]
Counter('ccbaab').most_common()
[('c', 2), ('b', 2), ('a', 2)]
来源:https://stackoverflow.com/questions/43076195/counter-most-commonn-how-to-override-arbitrary-ordering