Design of a python pickleable object that describes a file

拥有回忆 提交于 2019-12-05 22:01:20

The question isn't too clear; what it looks like is that:

  • you have a third-party module which has picklable classes
  • those classes may contain references to files, which makes the classes themselves not picklable because open files aren't picklable.

Essentially, you want to make open files picklable. You can do this fairly easily, with certain caveats. Here's an incomplete but functional sample:

import pickle
class PicklableFile(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj

    def __getattr__(self, key):
        return getattr(self.fileobj, key)

    def __getstate__(self):
        ret = self.__dict__.copy()
        ret['_file_name'] = self.fileobj.name
        ret['_file_mode'] = self.fileobj.mode
        ret['_file_pos'] = self.fileobj.tell()
        del ret['fileobj']
        return ret

    def __setstate__(self, dict):
        self.fileobj = open(dict['_file_name'], dict['_file_mode'])
        self.fileobj.seek(dict['_file_pos'])
        del dict['_file_name']
        del dict['_file_mode']
        del dict['_file_pos']
        self.__dict__.update(dict)

f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()

Caveats and notes, some obvious, some less so:

  • This class should operate directly on the file object you got from open. If you're using wrapper classes on files, like gzip.GzipFile, those should go above this, not below it. Logically, treat this as a decorator class on top of file.
  • If the file doesn't exist when you unpickle, it can't be unpickled and will throw an exception.
  • If it's a different file, the behavior may or may not make sense.
  • If the file mode includes file creation ('w+'), and the file doesn't exist, it'll be created; we don't know what file permissions to use, since that's not stored with the file. If this is important--it probably shouldn't be--then store the correct permissions in the class when you first create it.
  • If the file isn't seekable, trying to seek to the old position may raise IOError; if you're using a file like that you'll need to decide how to handle that.
  • The file classes in Python 2 and Python 3 are different; there's no file class in Python 3. Even if you're only using Python 2 right now, don't subclass file.

I'd steer away from doing this; having pickled data dependent on external files not changing and staying in the same place is brittle. This makes it difficult to even relocate files, since your pickled data won't make sense.

If you open a pointer to a file, pickle it, then attempt to reconstitute is later, there is no guarantee that file will still be available for opening.

To elaborate, the file pointer really represents a connection to the file. Just like a database connection, you can't "pickle" the other end of the connection, so this won't work.

Is it possible to keep the file pointer around in memory in its own process instead?

It sounds like you know you can't pickle the handle, and you're ok with that, you just want to pickle the part that can be pickled. As your object stands now, it can't be pickled because it has the handle. Do I have that right? If so, read on.

The pickle module will let your class describe its own state to pickle, for exactly these cases. You want to define your own __getstate__ method. The pickler will invoke it to get the state to be pickled, only if the method is missing does it go ahead and do the default thing of trying to pickle all the attributes.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!