Turning a generator of pairs into a pair of generators

自古美人都是妖i 提交于 2019-12-05 22:22:37

问题


How would I turn a generator of pairs (tuples):

tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])

Into two generators which would yield [1, 2, 3] and ["a", "b", "c"]?

I need to process separately the first and second elements of the tuples and the processing functions expect an iterable.

The generator is very large (millions of items) so I'd like to avoid having all items in memory at the same time unless there is no other solution.


回答1:


You can create n distinct iterators using the tee function from the itertools package. You would then iterate over them separately:

from itertools impor tee

i1, i2 = tee(tuple_gen, n=2)
firsts = (x[0] for x in i1)
seconds = (x[1] for x in i2)



回答2:


There's a fundamental problem here. Say you get your two iterators iter1 and iter2, and you pass iter1 to a function that eats the whole thing:

def consume(iterable):
    for thing in iterable:
        do_stuff_with(thing)

consume(iter1)

That's going to need to iterate through all of tuple_gen to get the first items, and then what do you do with the second items? Unless you're okay with rerunning the generator to get the second items again, you need to store all of them, in memory unless you can persist them to disk or something, so you're not much better off than if you'd just dumped tuple_gen into a list.


If you do this, you have to consume the iterators in parallel, or run the underlying generator twice, or spend a lot of memory saving the tuple elements you're not processing so the other iterator can go over them. Unfortunately, consuming the iterators in parallel will require either rewriting the consumer functions or running them in separate threads. Running the generator twice is simplest if you can do it, but not always an option.




回答3:


You can use itertools for operating as follows:

>>>from itertools import chain, izip, imap
>>>tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])
>>>nums_gen, letters_gen = imap(lambda x: chain(x), izip(*tuple_gen))
>>>list(nums_gen)
[1, 2, 3]
>>>list(letters_gen)
['a', 'b', 'c']

Note:

For python3 izip would be just zip, imap just map




回答4:


Case 1

I don't know where it comes from [(1, "a"), (2, "b"), (3, "c")] But if it comes from like below code

gen1 = (i for i in  [1,2,3])
gen2 = (i for i in ["a", "b", "c"])
tuple_gen = (i for i in zip(gen1, gen2))

You can use gen1 and gen2 directly.

Case 2

If you’ve already created the list [(1, "a"), (2, "b"), (3, "c")] and just don’t want to create the list twice. You can do like below.

lst = [(1, "a"), (2, "b"), (3, "c")]
gen1 = (i[0] for i in lst)
gen2 = (i[1] for i in lst)

Case 3

otherwise, just create one list, but it cosumes CPU resource to expand generator. This is what you don’t want to.

tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])
tmp = list(tuple_gen)
gen1 = iter(tmp)
gen2 = iter(tmp)

I think there is no way to reset generator, iterator to first position.



来源:https://stackoverflow.com/questions/47146905/turning-a-generator-of-pairs-into-a-pair-of-generators

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!