How to clone a Python generator object?

喜夏-厌秋 提交于 2019-11-26 22:37:12

You can use itertools.tee():

walk, walk2 = itertools.tee(walk)

Note that this might "need significant extra storage", as the documentation points out.

If you know you are going to iterate through the whole generator for every usage, you will probably get the best performance by unrolling the generator to a list and using the list multiple times.

walk = list(os.walk('/home'))

Define a function

 def walk_home():
     for r in os.walk('/home'):
         yield r

Or even this

def walk_home():
    return os.walk('/home')

Both are used like this:

for root, dirs, files in walk_home():
    for pathname in dirs+files:
        print os.path.join(root, pathname)

This is a good usecase for functools.partial() to make a quick generator-factory:

from functools import partial
import os

walk_factory = partial(os.walk, '/home')

walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()

What functools.partial() does is hard to describe with human-words, but this^ is what it's for.

It partially fills out function-params without executing that function. Consequently it acts as a function/generator factory.

This answer aims to extend/elaborate on what the other answers have expressed. The solution will necessarily vary depending on what exactly you aim to achieve.

If you want to iterate over the exact same result of os.walk multiple times, you will need to initialize a list from the os.walk iterable's items (i.e. walk = list(os.walk(path))).

If you must guarantee the data remains the same, that is probably your only option. However, there are several scenarios in which this is not possible or desirable.

  1. It will not be possible to list() an iterable if the output is of sufficient size (i.e. attempting to list() an entire filesystem may freeze your computer).
  2. It is not desirable to list() an iterable if you wish to acquire "fresh" data prior to each use.

In the event that list() is not suitable, you will need to run your generator on demand. Note that generators are extinguised after each use, so this poses a slight problem. In order to "rerun" your generator multiple times, you can use the following pattern:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

class WalkMaker:
    def __init__(self, path):
        self.path = path
    def __iter__(self):
        for root, dirs, files in os.walk(self.path):
            for pathname in dirs + files:
                yield os.path.join(root, pathname)

walk = WalkMaker('/home')

for path in walk:
    pass

# do something...

for path in walk:
    pass

The aforementioned design pattern will allow you to keep your code DRY.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!