Singleton python generator? Or, pickle a python generator?

前端 未结 6 1329
心在旅途
心在旅途 2021-01-03 06:58

I am using the following code, with nested generators, to iterate over a text document and return training examples using get_train_minibatch(). I would like to

6条回答
  •  情歌与酒
    2021-01-03 07:39

    The following code should do more-or-less what you want. The first class defines something that acts like a file but can be pickled. (When you unpickle it, it re-opens the file, and seeks to the location where it was when you pickled it). The second class is an iterator that generates word windows.

    class PickleableFile(object):
        def __init__(self, filename, mode='rb'):
            self.filename = filename
            self.mode = mode
            self.file = open(filename, mode)
        def __getstate__(self):
            state = dict(filename=self.filename, mode=self.mode,
                         closed=self.file.closed)
            if not self.file.closed:
                state['filepos'] = self.file.tell()
            return state
        def __setstate__(self, state):
            self.filename = state['filename']
            self.mode = state['mode']
            self.file = open(self.filename, self.mode)
            if state['closed']: self.file.close()
            else: self.file.seek(state['filepos'])
        def __getattr__(self, attr):
            return getattr(self.file, attr)
    
    class WordWindowReader:
        def __init__(self, filenames, window_size):
            self.filenames = filenames
            self.window_size = window_size
            self.filenum = 0
            self.stream = None
            self.filepos = 0
            self.prevwords = []
            self.current_line = []
    
        def __iter__(self):
            return self
    
        def next(self):
            # Read through files until we have a non-empty current line.
            while not self.current_line:
                if self.stream is None:
                    if self.filenum >= len(self.filenames):
                        raise StopIteration
                    else:
                        self.stream = PickleableFile(self.filenames[self.filenum])
                        self.stream.seek(self.filepos)
                        self.prevwords = []
                line = self.stream.readline()
                self.filepos = self.stream.tell()
                if line == '':
                    # End of file.
                    self.stream = None
                    self.filenum += 1
                    self.filepos = 0
                else:
                    # Reverse line so we can pop off words.
                    self.current_line = line.split()[::-1]
    
            # Get the first word of the current line, and add it to
            # prevwords.  Truncate prevwords when necessary.
            word = self.current_line.pop()
            self.prevwords.append(word)
            if len(self.prevwords) > self.window_size:
                self.prevwords = self.prevwords[-self.window_size:]
    
            # If we have enough words, then return a word window;
            # otherwise, go on to the next word.
            if len(self.prevwords) == self.window_size:
                return self.prevwords
            else:
                return self.next()
    

提交回复
热议问题