itertools groupby object not outputting correctly

浪子不回头ぞ 提交于 2019-12-24 11:29:34

问题


I was trying to use itertools.groupby to help me group a list of integers by positive or negative property, for example:

input

[1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3] 

will return

[[1,2,3],[-1,-2,-3],[1,2,3],[-1,-2,-3]]

However if I:

import itertools

nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = list(itertools.groupby(nums, key=lambda x: x>=0))
print(group_list)
for k, v in group_list:
    print(list(v))
>>>
[]
[-3]
[]
[]

But if I don't list() the groupby object, it will work fine:

nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = itertools.groupby(nums, key=lambda x: x>=0)
for k, v in group_list:
    print(list(v))
>>>
[1, 2, 3]
[-1, -2, -3]
[1, 2, 3]
[-1, -2, -3]

What I don't understand is, a groupby object is a iterator composed by a pair of key and _grouper object, a call of list() of a groupby object should not consume the _grouper object?

And even if it did consume, how did I get [-3] from the second element?


回答1:


Per the docs, it is explicitly noted that advancing the groupby object renders the previous group unusable (in practice, empty):

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.

Basically, instead of list-ifying directly with the list constructor, you'd need a listcomp that converts from group iterators to lists before advancing the groupby object, replacing:

group_list = list(itertools.groupby(nums, key=lambda x: x>=0))

with:

group_list = [(k, list(g)) for k, g in itertools.groupby(nums, key=lambda x: x>=0)]

The design of most itertools module types is intended to avoid storing data implicitly, because they're intended to be used with potentially huge inputs. If all the groupers stored copies of all the data from the input (and the groupby object had to be sure to retroactively populate them), it would get ugly, and potentially blow memory by accident. By forcing you to make storing the values explicit, you don't accidentally store unbounded amounts of data unintentionally, per the Zen of Python:

Explicit is better than implicit.



来源:https://stackoverflow.com/questions/48655138/itertools-groupby-object-not-outputting-correctly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!