Can Python's list comprehensions (ideally) do the equivalent of 'count(*)…group by…' in SQL?

问题

I think list comprehensions may give me this, but I'm not sure: any elegant solutions in Python (2.6) in general for selecting unique objects in a list and providing a count?

(I've defined an __eq__ to define uniqueness on my object definition).

So in RDBMS-land, something like this:

CREATE TABLE x(n NUMBER(1));
INSERT INTO x VALUES(1);
INSERT INTO x VALUES(1);
INSERT INTO x VALUES(1);
INSERT INTO x VALUES(2);

SELECT COUNT(*), n FROM x
GROUP BY n;

Which gives:

COUNT(*) n
==========
3        1
1        2

So , here's my equivalent list in Python:

[1,1,1,2]

And I want the same output as the SQL SELECT gives above.

EDIT: The example I gave here was simplified, I'm actually processing lists of user-defined object-instances: just for completeness I include the extra code I needed to get the whole thing to work:

import hashlib

def __hash__(self):
    md5=hashlib.md5()
    [md5.update(i) for i in self.my_list_of_stuff]
    return int(md5.hexdigest(),16)

The __hash__ method was needed to get the set conversion to work (I opted for the list-comprehension idea that works in 2.6 [despite the fact that I learnt that involves an inefficiency (see comments) - my data set is small enough for that not be an issue]). my_list_of_stuff above is a list of (Strings) on my object definition.

回答1:

Lennart Regebro provided a nice one-liner that does what you want:

>>> values = [1,1,1,2]
>>> print [(x,values.count(x)) for x in set(values)]
[(1, 3), (2, 1)]

As S.Lott mentions, a defaultdict can do the same thing.

回答2:

>>> from collections import Counter
>>> Counter([1,1,1,2])
Counter({1: 3, 2: 1})

Counter only available in py3.1, inherits from the dict.

回答3:

Not easily doable as a list comprehension.

from collections import defaultdict
def group_by( someList ):
    counts = defaultdict(int)
    for value in someList:
        counts[value.aKey] += 1
    return counts

This is a very Pythonic solution. But not a list comprehension.

回答4:

You can use groupby from the itertools module:

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

>>> a = [1,1,1,2]
>>> [(len(list(v)), key) for (key, v) in itertools.groupby(sorted(a))]
[(3, 1), (1, 2)]

I would assume its runtime is worse than the dict-based solutions by SilentGhost or S.Lott since it has to sort the input sequence, but you should time that yourself. It is a list comprehension, though. It should be faster than Adam Bernier's solution, since it doesn't have to do repeated linear scans of the input sequence. If needed, the sorted call can be avoided by sorting the input sequence in-line.

回答5:

The following works in Python 2.4 and should therefore work in Python 2.6:

lst = [1,1,2,2,3,4,5,6,5]
lst_tmp = []
lst_dups = []

for item in lst:
    if item in lst_tmp:
        lst_dups.append(item)
    else:
        lst_tmp.append(item)

if len(lst_dups):
    lst_dups = sorted(set(lst_dups))
    for item in lst_dups:
        print str(lst.count(item)), "instances of", item
else:
    print "list is unique"

来源：https://stackoverflow.com/questions/2148480/can-pythons-list-comprehensions-ideally-do-the-equivalent-of-count-grou

标签

python

list

count

python-2.6