Pickle serialization order mystery

对着背影说爱祢 提交于 2019-12-24 09:56:13

问题


Update 6/8/17

Though 3 years passed, my PR is still pending as a temporary solution by enforcing the output order. Stream-Framework might reconsider its design on using content as key for notifications. GitHub Issue #153 references this.

Question

See following sample:

import pickle
x = {'order_number': 'X', 'deal_url': 'J'}

pickle.dumps(x)
pickle.dumps(pickle.loads(pickle.dumps(x)))
pickle.dumps(pickle.loads(pickle.dumps(pickle.loads(pickle.dumps(x)))))

Results:

(dp0\nS'deal_url'\np1\nS'J'\np2\nsS'order_number'\np3\nS'X'\np4\ns.
(dp0\nS'order_number'\np1\nS'X'\np2\nsS'deal_url'\np3\nS'J'\np4\ns.
(dp0\nS'deal_url'\np1\nS'J'\np2\nsS'order_number'\np3\nS'X'\np4\ns.

Clearly, serialized output changes for every dump. When I remove a character from any of keys, this doesn't happen. I discovered this as Stream-Framework use pickled output as key for storage of notifications on its k/v store. I will pull request if we get a better understanding what is going on here. I have found two solutions to prevent it:

A - Convert to dictionary after sorting (yes, somehow provides the intended side effect)

import operator
sorted_x = dict(sorted(x.iteritems(), key=operator.itemgetter(1)))

B - Remove underscores (but not sure if this always works)

So what causes the mystery under dictionary sorting for pickle?

Proof that calling sort over dict provides dump to produce same result:

import operator
x = dict(sorted(x.iteritems(), key=operator.itemgetter(1)))

pickle.dumps(x)
"(dp0\nS'order_number'\np1\nS'X'\np2\nsS'deal_url'\np3\nS'J'\np4\ns."

x = pickle.loads(pickle.dumps(x))
x = dict(sorted(x.iteritems(), key=operator.itemgetter(1)))

pickle.dumps(x)
"(dp0\nS'order_number'\np1\nS'X'\np2\nsS'deal_url'\np3\nS'J'\np4\ns."

回答1:


Dictionaries are unsorted data structures. This means that the order is arbitrary and pickle will store them as they are. You can use the collections.OrderedDict if you want to use a sorted dictionary.

Any order you think you see when you're playing around in the interpreter is just the interpreter playing nice with you.

From the documentation of dict:

It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary)

Remember that the functions dict.keys(), dict.values() and dict.items() also return their respective values in arbitrary order.



来源:https://stackoverflow.com/questions/23069908/pickle-serialization-order-mystery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!