How to group list of tuples?

问题

Note: I know how I can do this of course in an explicit for loop but I am looking for a solution that is a bit more readable.

If possible, I'd like to solve this by using some of the built-in functionalities. Best case scenario is something like

result = [ *groupby logic* ]

Assuming the following list:

import numpy as np
np.random.seed(42)

N = 10

my_tuples = list(zip(np.random.choice(list('ABC'), size=N),
                     np.random.choice(range(100), size=N)))

where my_tuples is

[('C', 74),
 ('A', 74),
 ('C', 87),
 ('C', 99),
 ('A', 23),
 ('A', 2),
 ('C', 21),
 ('B', 52),
 ('C', 1),
 ('C', 87)]

How can I group the indices (integer value at index 1 of each tuple) by the labels A, B and C using groupby from itertools?

If I do something like this:

from itertools import groupby

#..

[(k,*v) for k, v in dict(groupby(my_tuples, lambda x: x[0])).items()]

I see that this delivers the wrong result.

The desired outcome should be

{
  'A': [74, 23, 2],
  # ..
}

回答1:

You should use collections.defaultdict for an O(n) solution, see @PatrickHaugh's answer.

Using itertools.groupby requires sorting before grouping, incurring O(n log n) complexity:

from itertools import groupby
from operator import itemgetter

sorter = sorted(my_tuples, key=itemgetter(0))
grouper = groupby(sorter, key=itemgetter(0))

res = {k: list(map(itemgetter(1), v)) for k, v in grouper}

print(res)

{'A': [74, 23, 2],
 'B': [52],
 'C': [74, 87, 99, 21, 1, 87]}

回答2:

The simplest solution is probably not to use groupby at all.

from collections import defaultdict

d = defaultdict(list)

for k, v in my_tuples:
    d[k].append(v)

The reason I wouldn't use groupby is because groupby(iterable) groups items in iterable that are adjacent. So to get all of the 'C' values together, you would first have to sort your list. Unless you have some reason to use groupby, it's unnecessary.

来源：https://stackoverflow.com/questions/50624389/how-to-group-list-of-tuples

标签

python

sorting

dictionary

grouping

itertools