问题
Note: I know how I can do this of course in an explicit for loop but I am looking for a solution that is a bit more readable.
If possible, I'd like to solve this by using some of the built-in functionalities. Best case scenario is something like
result = [ *groupby logic* ]
Assuming the following list:
import numpy as np
np.random.seed(42)
N = 10
my_tuples = list(zip(np.random.choice(list('ABC'), size=N),
np.random.choice(range(100), size=N)))
where my_tuples
is
[('C', 74),
('A', 74),
('C', 87),
('C', 99),
('A', 23),
('A', 2),
('C', 21),
('B', 52),
('C', 1),
('C', 87)]
How can I group the indices (integer value at index 1 of each tuple) by the labels A, B and C using groupby
from itertools?
If I do something like this:
from itertools import groupby
#..
[(k,*v) for k, v in dict(groupby(my_tuples, lambda x: x[0])).items()]
I see that this delivers the wrong result.
The desired outcome should be
{
'A': [74, 23, 2],
# ..
}
回答1:
You should use collections.defaultdict
for an O(n) solution, see @PatrickHaugh's answer.
Using itertools.groupby
requires sorting before grouping, incurring O(n log n) complexity:
from itertools import groupby
from operator import itemgetter
sorter = sorted(my_tuples, key=itemgetter(0))
grouper = groupby(sorter, key=itemgetter(0))
res = {k: list(map(itemgetter(1), v)) for k, v in grouper}
print(res)
{'A': [74, 23, 2],
'B': [52],
'C': [74, 87, 99, 21, 1, 87]}
回答2:
The simplest solution is probably not to use groupby
at all.
from collections import defaultdict
d = defaultdict(list)
for k, v in my_tuples:
d[k].append(v)
The reason I wouldn't use groupby
is because groupby(iterable)
groups items in iterable
that are adjacent. So to get all of the 'C'
values together, you would first have to sort your list. Unless you have some reason to use groupby
, it's unnecessary.
来源:https://stackoverflow.com/questions/50624389/how-to-group-list-of-tuples