Python: Checking if an object is atomically pickleable

风格不统一 提交于 2019-12-23 17:56:50

问题


What's an accurate way of checking whether an object can be atomically pickled? When I say "atomically pickled", I mean without considering other objects it may refer to. For example, this list:

l = [threading.Lock()]

is not a a pickleable object, because it refers to a Lock which is not pickleable. But atomically, this list itself is pickleable.

So how do you check whether an object is atomically pickleable? (I'm guessing the check should be done on the class, but I'm not sure.)

I want it to behave like this:

>>> is_atomically_pickleable(3)
True
>>> is_atomically_pickleable(3.1)
True
>>> is_atomically_pickleable([1, 2, 3])
True
>>> is_atomically_pickleable(threading.Lock())
False
>>> is_atomically_pickleable(open('whatever', 'r'))
False

Etc.


回答1:


Given that you're willing to break encapsulation, I think this is the best you can do:

from pickle import Pickler
import os

class AtomicPickler(Pickler):
  def __init__(self, protocol):
    # You may want to replace this with a fake file object that just
    # discards writes.
    blackhole = open(os.devnull, 'w')

    Pickler.__init__(self, blackhole, protocol)
    self.depth = 0

  def save(self, o):
    self.depth += 1
    if self.depth == 1:
      return Pickler.save(self, o)
    self.depth -= 1
    return

def is_atomically_pickleable(o, protocol=None):
  pickler = AtomicPickler(protocol)
  try:
    pickler.dump(o)
    return True
  except:
    # Hopefully this exception was actually caused by dump(), and not
    # something like a KeyboardInterrupt
    return False

In Python the only way you can tell if something will work is to try it. That's the nature of a language as dynamic as Python. The difficulty with your question is that you want to distinguish between failures at the "top level" and failures at deeper levels.

Pickler.save is essentially the control-center for Python's pickling logic, so the above creates a modified Pickler that ignores recursive calls to its save method. Any exception raised while in the top-level save is treated as a pickling failure. You may want to add qualifiers to the except statement. Unqualified excepts in Python are generally a bad idea as exceptions are used not just for program errors but also for things like KeyboardInterrupt and SystemExit.

This can give what are arguably false negatives for types with odd custom pickling logic. For example, if you create a custom list-like class that instead of causing Pickler.save to be recursively called it actually tried to pickle its elements on its own somehow, and then created an instance of this class that contained an element that its custom logic could not pickle, is_atomically_pickleable would return False for this instance even though removing the offending element would result in an object that was pickleable.

Also, note the protocol argument to is_atomically_pickleable. Theoretically an object could behave differently when pickled with different protocols (though that would be pretty weird) so you should make this match the protocol argument you give to dump.




回答2:


Given the dynamic nature of Python, I don't think there's really a well-defined way to do what you're asking aside from heuristics or a whitelist.

If I say:

x = object()

is x "atomically pickleable"? What if I say:

x.foo = threading.Lock()

? is x "atomically pickleable" now?

What if I made a separate class that always had a lock attribute? What if I deleted that attribute from an instance?




回答3:


I think the persistent_id interface is a poor match for you are attempting to do. It is designed to be used when your object should refer to equivalent objects on the new program rather then copies of the old one. You are attempting to filter out every object that cannot be pickled which is different and why are you attempting to do this.

I think this is a sign of problem in your code. That fact that you want to pickle objects which refer to gui widgets, files, and locks suggests that you are doing something strange. The kind of objects you typically persist shouldn't be related to or hold references to that sort of object.

Having said that, I think your best option is the following:

class MyPickler(Pickler):
    def save(self, obj):
        try:
             Pickler.save(self, obj)
        except PicklingEror:
             Pickle.save( self, FilteredObject(obj) )

This should work for the python implementation, I make no guarantees as to what will happen in the C implementation. Every object which gets saved will be passed to the save method. This method will raise the PicklingError when it cannot pickle the object. At this point, you can step in and recall the function asking it to pickle your own object which should pickle just fine.

EDIT

From my understanding, you have essentially a user-created dictionary of objects. Some objects are picklable and some aren't. I'd do this:

class saveable_dict(dict):
    def __getstate__(self):
        data = {}
        for key, value in self.items():
             try:
                  encoded = cPickle.dumps(value)
             except PicklingError:
                  encoded = cPickle.dumps( Unpickable() )
        return data

    def __setstate__(self, state):
       for key, value in state:
           self[key] = cPickle.loads(value)

Then use that dictionary when you want to hold that collection of objects. The user should be able to get any picklable objects back, but everything else will come back as the Unpicklable() object. The difference between this and the previous approach is in objects which are themselves pickable but have references to unpicklable objects. But those objects are probably going to come back broken regardless.

This approach also has the benefit that it remains entirely within the defined API and thus should work in either cPickle or pickle.




回答4:


I ended up coding my own solution to this.

Here's the code. Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.

If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.

Here is an excerpt from the module (may be outdated):

import re
import cPickle as pickle_module
import pickle # Importing just to get dispatch table, not pickling with it.
import copy_reg
import types

from garlicsim.general_misc import address_tools
from garlicsim.general_misc import misc_tools


def is_atomically_pickleable(thing):
    '''
    Return whether `thing` is an atomically pickleable object.

    "Atomically-pickleable" means that it's pickleable without considering any
    other object that it contains or refers to. For example, a `list` is
    atomically pickleable, even if it contains an unpickleable object, like a
    `threading.Lock()`.

    However, the `threading.Lock()` itself is not atomically pickleable.
    '''
    my_type = misc_tools.get_actual_type(thing)
    return _is_type_atomically_pickleable(my_type, thing)


def _is_type_atomically_pickleable(type_, thing=None):
    '''Return whether `type_` is an atomically pickleable type.'''
    try:
        return _is_type_atomically_pickleable.cache[type_]
    except KeyError:
        pass

    if thing is not None:
        assert isinstance(thing, type_)

    # Sub-function in order to do caching without crowding the main algorithm:
    def get_result():

        # We allow a flag for types to painlessly declare whether they're
        # atomically pickleable:
        if hasattr(type_, '_is_atomically_pickleable'):
            return type_._is_atomically_pickleable

        # Weird special case: `threading.Lock` objects don't have `__class__`.
        # We assume that objects that don't have `__class__` can't be pickled.
        # (With the exception of old-style classes themselves.)
        if not hasattr(thing, '__class__') and \
           (not isinstance(thing, types.ClassType)):
            return False

        if not issubclass(type_, object):
            return True

        def assert_legit_pickling_exception(exception):
            '''Assert that `exception` reports a problem in pickling.'''
            message = exception.args[0]
            segments = [
                "can't pickle",
                'should only be shared between processes through inheritance',
                'cannot be passed between processes or pickled'
            ]
            assert any((segment in message) for segment in segments)
            # todo: turn to warning

        if type_ in pickle.Pickler.dispatch:
            return True

        reduce_function = copy_reg.dispatch_table.get(type_)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce_ex__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing, 0)
                # (The `0` is the protocol argument.)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        return False

    result = get_result()
    _is_type_atomically_pickleable.cache[type_] = result
    return result

_is_type_atomically_pickleable.cache = {}



回答5:


dill has the pickles method for such a check.

>>> import threading
>>> l = [threading.Lock()]
>>> 
>>> import dill
>>> dill.pickles(l)
True
>>> 
>>> dill.pickles(threading.Lock())
True
>>> f = open('whatever', 'w') 
>>> f.close()
>>> dill.pickles(open('whatever', 'r'))
True

Well, dill atomically pickles all of your examples, so let's try something else:

>>> l = [iter([1,2,3]), xrange(5)]
>>> dill.pickles(l)
False

Ok, this fails. Now, let's investigate:

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False
>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
[False, True]

Ok. we can see the iter fails, but the xrange does pickle. So, let's replace the iter.

>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
Si: xrange(1, 4)
F2: <function _eval_repr at 0x106991cf8>
Si: xrange(5)
True

Now our object atomically pickles.



来源:https://stackoverflow.com/questions/4199947/python-checking-if-an-object-is-atomically-pickleable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!