Unpickle binary file to text [duplicate]

China☆狼群 提交于 2019-12-11 09:55:31

问题


I need to do some maintenance on a system that basically looks like:

(Complicated legacy Python program) -> binary pickle file -> (Another complicated legacy Python program)

Which requires figuring out exactly what is in the intermediate pickle file. I suspect the file format is much simpler than the codes that generate and consume it, and it would help if I could verify that by eyeballing the file itself instead of having to figure out exactly what all the code does.

Is there a way to take the binary pickle file and convert it, not to live objects in memory (which is what every page I could find with a Google search reckons 'unpickle' means) but to some readable text format? JSON, XML, whatever, I'm not fussy about the exact format, anything would do as long as it is a complete and readable representation of the contents that I can load up in a text editor and look at.


回答1:


If the application is old enough it might use pickle protocol 0, which is human-readable.

You could try the pickletools module found in python 3.2+. Using python3 -m pickletools <file> will "disassemle" the pickle file for you.

Alternatively, you could try loading the data using data = pickle.load() and then immediately dump it using print(json.dumps(data)). Note that this might fail, because pickle can represent more things than JSON can.




回答2:


Python native types are just readable enough. The hard part in your way is that unpickling will automatically try to import any modules with classes for any instances defined in code in your file.

Fortunatelly, Python is flexible enough its possible to temporarily hack the import machinery in order to fool the unpickling and give it false classes to fill in with the pickled attributes.

Then, it is a matter of converting the dictionary of the instances that were unpickled in this way back to human readable.

Fortunatelly, I maintain a pet project that performs this "temporary import system hacking", so I could lend a couple lines of code from there to make the same here.

In order to test this thing, I ended up creating a stand-alone script. As the comments on it spell: don't try to incorporate this in a larger program - it will break the running Python program as it is, by creating faking modules - but it should be enough for you to visualize what is pickled in there - although it would be impossible to match all the corner cases you can have there - you will have to work from here, mostly on the "pythonize" function bellow:

import re, pickle, pprint, sys
from types import ModuleType
from collections.abc import Sequence, Mapping, Set
from contextlib import contextmanager


def pythonize(obj):
    if isinstance(obj, (str, bytes)):
        return obj
    if isinstance(obj, (Sequence, Set)):
        container = []
        for element in obj:
            container.append(pythonize(element))
        return container
    elif isinstance(obj, Mapping):
        container = {}
    else:
        container = {"$CLS": obj.__class__.__qualname__}
        if not hasattr(obj, "__dict__"):
            return repr(obj)
        obj = obj.__dict__
    for key, value in obj.items():
        container[key] = pythonize(value)
    return container


class FakeModule:
    def __getattr__(self, attr):
        cls = type(attr, (), {})
        setattr(self, attr, cls)
        return cls


def fake_importer(name, globals, locals, fromlist, level):
    module = sys.modules[name] = FakeModule()
    return module


@contextmanager
def fake_import_system():
    # With code lifted from https://github.com/jsbueno/extradict - MapGetter functionality
    builtins = __builtins__ if isinstance(__builtins__, dict) else __builtins__.__dict__
    original_import = builtins["__import__"]
    builtins["__import__"] = fake_importer
    yield None
    builtins["__import__"] = original_import


def unpickle_to_text(stream: bytes):
    # WARNING: this example will wreck havoc in loaded modules!
    # do not use as part of a complex system!!

    action_log = []

    with fake_import_system():
        result = pickle.loads(stream)

    pythonized = pythonize(result)
    return pprint.pformat(pythonized)


if __name__ == "__main__":
    print(unpickle_to_text(open(sys.argv[1], "rb").read()))

update: as this might have some use for more people, I just made a gist out of this code. Maybe it is even pip worth: https://gist.github.com/jsbueno/b72a20cba121926bec19163780390b92



来源:https://stackoverflow.com/questions/54047757/unpickle-binary-file-to-text

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!