问题
First, apologies for my poor description of the problem. I can't find a better one.
I found that applying list to an itertools.groupby result will destroy the result. See code:
import itertools
import operator
log = '''\
hello world
hello there
hi guys
hi girls'''.split('\n')
data = [line.split() for line in log]
grouped = list(itertools.groupby(data, operator.itemgetter(0)))
for key, group in grouped:
print key, group, list(group)
print '-'*80
grouped = itertools.groupby(data, operator.itemgetter(0))
for key, group in grouped:
print key, group, list(group)
The result is:
hello <itertools._grouper object at 0x01A86050> []
hi <itertools._grouper object at 0x01A86070> [['hi', 'girls']]
--------------------------------------------------------------------------------
<itertools.groupby object at 0x01A824E0>
hello <itertools._grouper object at 0x01A860B0> [['hello', 'world'], ['hello', 'there']]
hi <itertools._grouper object at 0x01A7DFF0> [['hi', 'guys'], ['hi', 'girls']]
Probably this is related to the internal working of the groupby function. Nevertheless it surprised me today.
回答1:
This is documented:
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible.
When you do list(groupby(...))
, you advance the groupby object all the way to the end, this losing all groups except the last. If you need to save the groups, do as shown in the documentation and save each one while iterating over the groupby object.
回答2:
The example in the documentation is not as nice as:
list((key, list(group)) for key, group in itertools.groupby(...))
in turning the iterator into a list of tuples of keys and lists of groups: [(key,[group])]
if that is what is desired.
来源:https://stackoverflow.com/questions/22706606/weirdness-of-itertools-groupby-in-python-when-realizing-the-groupby-result-early