Is the pickling process deterministic?

一个人想着一个人 提交于 2021-02-07 12:26:42

问题


Does Pickle always produce the same output for a certain input value? I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories. My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.


回答1:


I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories.

Right:

>>> pickle.dumps({1: 0, 9: 0}) == pickle.dumps({9: 0, 1: 0})
False

See also: pickle.dumps not suitable for hashing

My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.

There's a number of fundamental problems with this. It's impossible to come up with an object-to-string transformation that maps equality correctly—think of the problem of object identity:

>>> a = object()
>>> b = object()
>>> a == b
False
>>> pickle.dumps(b) == pickle.dumps(a)
True

Depending on your exact requirements, you may be able to transform object hierarchies into ones that you could then hash:

def hashablize(obj):
    """Convert a container hierarchy into one that can be hashed.
    
    Don't use this with recursive structures!
    Also, this won't be useful if you pass dictionaries with
    keys that don't have a total order.
    Actually, maybe you're best off not using this function at all."""
    try:
        hash(obj)
    except TypeError:
        if isinstance(obj, dict):
            return tuple((k, hashablize(v)) for (k, v) in sorted(obj.iteritems()))
        elif hasattr(obj, '__iter__'):
            return tuple(hashablize(o) for o in obj)
        else:
            raise TypeError("Can't hashablize object of type %r" % type(obj))
    else:
        return obj



回答2:


What do you mean by same output ? You should normally always get the same output for a roundtrip (pickling -> unpickling), but I don't think the serialized format itself is guaranteed to be the same in every condition. Certainly, it may change between platforms and all that.

Within one run of your program, using pickling for memoization should be fine - I have used this scheme several times without trouble, but that was for quite simple problems. One problem is that this does not cover every useful case (function come to mind: you cannot pickle them, so if your function takes a callable argument, that won't work).



来源:https://stackoverflow.com/questions/985294/is-the-pickling-process-deterministic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!