问题
I need to do some maintenance on a system that basically looks like:
(Complicated legacy Python program) -> binary pickle file -> (Another complicated legacy Python program)
Which requires figuring out exactly what is in the intermediate pickle file. I suspect the file format is much simpler than the codes that generate and consume it, and it would help if I could verify that by eyeballing the file itself instead of having to figure out exactly what all the code does.
Is there a way to take the binary pickle file and convert it, not to live objects in memory (which is what every page I could find with a Google search reckons 'unpickle' means) but to some readable text format? JSON, XML, whatever, I'm not fussy about the exact format, anything would do as long as it is a complete and readable representation of the contents that I can load up in a text editor and look at.
回答1:
If the application is old enough it might use pickle protocol 0, which is human-readable.
You could try the pickletools module found in python 3.2+.
Using python3 -m pickletools <file> will "disassemle" the pickle file for you.
Alternatively, you could try loading the data using data = pickle.load() and then immediately dump it using print(json.dumps(data)). Note that this might fail, because pickle can represent more things than JSON can.
回答2:
Python native types are just readable enough. The hard part in your way is that unpickling will automatically try to import any modules with classes for any instances defined in code in your file.
Fortunatelly, Python is flexible enough its possible to temporarily hack the import machinery in order to fool the unpickling and give it false classes to fill in with the pickled attributes.
Then, it is a matter of converting the dictionary of the instances that were unpickled in this way back to human readable.
Fortunatelly, I maintain a pet project that performs this "temporary import system hacking", so I could lend a couple lines of code from there to make the same here.
In order to test this thing, I ended up creating a stand-alone script. As the comments on it spell: don't try to incorporate this in a larger program - it will break the running Python program as it is, by creating faking modules - but it should be enough for you to visualize what is pickled in there - although it would be impossible to match all the corner cases you can have there - you will have to work from here, mostly on the "pythonize" function bellow:
import re, pickle, pprint, sys
from types import ModuleType
from collections.abc import Sequence, Mapping, Set
from contextlib import contextmanager
def pythonize(obj):
if isinstance(obj, (str, bytes)):
return obj
if isinstance(obj, (Sequence, Set)):
container = []
for element in obj:
container.append(pythonize(element))
return container
elif isinstance(obj, Mapping):
container = {}
else:
container = {"$CLS": obj.__class__.__qualname__}
if not hasattr(obj, "__dict__"):
return repr(obj)
obj = obj.__dict__
for key, value in obj.items():
container[key] = pythonize(value)
return container
class FakeModule:
def __getattr__(self, attr):
cls = type(attr, (), {})
setattr(self, attr, cls)
return cls
def fake_importer(name, globals, locals, fromlist, level):
module = sys.modules[name] = FakeModule()
return module
@contextmanager
def fake_import_system():
# With code lifted from https://github.com/jsbueno/extradict - MapGetter functionality
builtins = __builtins__ if isinstance(__builtins__, dict) else __builtins__.__dict__
original_import = builtins["__import__"]
builtins["__import__"] = fake_importer
yield None
builtins["__import__"] = original_import
def unpickle_to_text(stream: bytes):
# WARNING: this example will wreck havoc in loaded modules!
# do not use as part of a complex system!!
action_log = []
with fake_import_system():
result = pickle.loads(stream)
pythonized = pythonize(result)
return pprint.pformat(pythonized)
if __name__ == "__main__":
print(unpickle_to_text(open(sys.argv[1], "rb").read()))
update: as this might have some use for more people, I just made a gist out of this code. Maybe it is even pip worth: https://gist.github.com/jsbueno/b72a20cba121926bec19163780390b92
来源:https://stackoverflow.com/questions/54047757/unpickle-binary-file-to-text