I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are arou
You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.
From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter
object.
Counter
example
from collections import Counter
counts = Counter(x[0] for x in b_data)
print(counts['example'])
Sure, you can use list.count
if it’s only one item you want to find frequency counts for, but in the general case, a Counter
is the way to go.
The advantage of a Counter
is it performs frequency counts of all elements (not just example
) in linear (O(N)
) time. Say you also wanted to query the count of another element, say foo
. That would be done with -
print(counts['foo'])
If 'foo'
doesn’t exist in the list, 0
is returned.
If you want to find the most common elements, call counts.most_common
-
print(counts.most_common(n))
Where n
is the number of elements you want to display. If you want to see everything, don't pass n
.
To retrieve counts of most common elements, one efficient way to do this is to query most_common
and then extract all elements with counts over 1, efficiently with itertools
.
from itertools import takewhile
l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
c = Counter(l)
list(takewhile(lambda x: x[-1] > 1, c.most_common()))
[(1, 5), (3, 4), (2, 3), (7, 2)]
(OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -
[item[0] for item in counts.most_common() if item[-1] > 1]
Keep in mind that this isn’t as efficient as the itertools.takewhile
solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common
returns frequency counts in descending order). With takewhile
that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.