问题
I was trying to use itertools.groupby to help me group a list of integers by positive or negative property, for example:
input
[1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
will return
[[1,2,3],[-1,-2,-3],[1,2,3],[-1,-2,-3]]
However if I:
import itertools
nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = list(itertools.groupby(nums, key=lambda x: x>=0))
print(group_list)
for k, v in group_list:
print(list(v))
>>>
[]
[-3]
[]
[]
But if I don't list()
the groupby object, it will work fine:
nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = itertools.groupby(nums, key=lambda x: x>=0)
for k, v in group_list:
print(list(v))
>>>
[1, 2, 3]
[-1, -2, -3]
[1, 2, 3]
[-1, -2, -3]
What I don't understand is, a groupby object is a iterator composed by a pair of key and _grouper
object, a call of list()
of a groupby object should not consume the _grouper
object?
And even if it did consume, how did I get [-3]
from the second element?
回答1:
Per the docs, it is explicitly noted that advancing the groupby
object renders the previous group unusable (in practice, empty):
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.
Basically, instead of list
-ifying directly with the list
constructor, you'd need a listcomp that converts from group iterators to list
s before advancing the groupby
object, replacing:
group_list = list(itertools.groupby(nums, key=lambda x: x>=0))
with:
group_list = [(k, list(g)) for k, g in itertools.groupby(nums, key=lambda x: x>=0)]
The design of most itertools
module types is intended to avoid storing data implicitly, because they're intended to be used with potentially huge inputs. If all the groupers stored copies of all the data from the input (and the groupby
object had to be sure to retroactively populate them), it would get ugly, and potentially blow memory by accident. By forcing you to make storing the values explicit, you don't accidentally store unbounded amounts of data unintentionally, per the Zen of Python:
Explicit is better than implicit.
来源:https://stackoverflow.com/questions/48655138/itertools-groupby-object-not-outputting-correctly