Count frequency of item in a list of tuples

前端 未结 4 1023
梦谈多话
梦谈多话 2020-12-17 15:30

I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are arou

4条回答
  •  抹茶落季
    2020-12-17 16:05

    You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.

    From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter object.

    1. Extract the first elements from your list of tuples (using a comprehension)
    2. Pass this to Counter
    3. Query count of example
    from collections import Counter
    
    counts = Counter(x[0] for x in b_data)
    print(counts['example'])
    

    Sure, you can use list.count if it’s only one item you want to find frequency counts for, but in the general case, a Counter is the way to go.


    The advantage of a Counter is it performs frequency counts of all elements (not just example) in linear (O(N)) time. Say you also wanted to query the count of another element, say foo. That would be done with -

    print(counts['foo'])
    

    If 'foo' doesn’t exist in the list, 0 is returned.

    If you want to find the most common elements, call counts.most_common -

    print(counts.most_common(n))
    

    Where n is the number of elements you want to display. If you want to see everything, don't pass n.


    To retrieve counts of most common elements, one efficient way to do this is to query most_common and then extract all elements with counts over 1, efficiently with itertools.

    from itertools import takewhile
    
    l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
    c = Counter(l)
    
    list(takewhile(lambda x: x[-1] > 1, c.most_common()))
    [(1, 5), (3, 4), (2, 3), (7, 2)]
    

    (OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -

    [item[0] for item in counts.most_common() if item[-1] > 1]
    

    Keep in mind that this isn’t as efficient as the itertools.takewhile solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common returns frequency counts in descending order). With takewhile that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.

提交回复
热议问题