Python dataclass from a nested dict

后端 未结 10 746
孤街浪徒
孤街浪徒 2020-12-22 23:38

The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):

from dataclasses import dataclass, asdict
from typing im         


        
相关标签:
10条回答
  • 2020-12-22 23:59

    Validobj does just that. Compared to other libraries, it provides a simpler interface (just one function at the moment) and emphasizes informative error messages. For example, given a schema like

    import dataclasses
    from typing import Optional, List
    
    
    @dataclasses.dataclass
    class User:
        name: str
        phone: Optional[str] = None
        tasks: List[str] = dataclasses.field(default_factory=list)
    

    One gets an error like

    >>> import validobj
    >>> validobj.parse_input({
    ...      'phone': '555-1337-000', 'address': 'Somewhereville', 'nme': 'Zahari'}, User
    ... )
    Traceback (most recent call last):
    ...
    WrongKeysError: Cannot process value into 'User' because fields do not match.
    The following required keys are missing: {'name'}. The following keys are unknown: {'nme', 'address'}.
    Alternatives to invalid value 'nme' include:
      - name
    
    All valid options are:
      - name
      - phone
      - tasks
    

    for a typo on a given field.

    0 讨论(0)
  • 2020-12-23 00:03

    Below is the CPython implementation of asdict – or specifically, the internal recursive helper function _asdict_inner that it uses:

    # Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
    
    def _asdict_inner(obj, dict_factory):
        if _is_dataclass_instance(obj):
            result = []
            for f in fields(obj):
                value = _asdict_inner(getattr(obj, f.name), dict_factory)
                result.append((f.name, value))
            return dict_factory(result)
        elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
            # [large block of author comments]
            return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
        elif isinstance(obj, (list, tuple)):
            # [ditto]
            return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
        elif isinstance(obj, dict):
            return type(obj)((_asdict_inner(k, dict_factory),
                              _asdict_inner(v, dict_factory))
                             for k, v in obj.items())
        else:
            return copy.deepcopy(obj)
    

    asdict simply calls the above with some assertions, and dict_factory=dict by default.

    How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?


    1. Adding type information

    My attempt involved creating a custom return wrapper inheriting from dict:

    class TypeDict(dict):
        def __init__(self, t, *args, **kwargs):
            super(TypeDict, self).__init__(*args, **kwargs)
    
            if not isinstance(t, type):
                raise TypeError("t must be a type")
    
            self._type = t
    
        @property
        def type(self):
            return self._type
    

    Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass-es:

    # only use dict for now; easy to add back later
    def _todict_inner(obj):
        if is_dataclass_instance(obj):
            result = []
            for f in fields(obj):
                value = _todict_inner(getattr(obj, f.name))
                result.append((f.name, value))
            return TypeDict(type(obj), result)
    
        elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
            return type(obj)(*[_todict_inner(v) for v in obj])
        elif isinstance(obj, (list, tuple)):
            return type(obj)(_todict_inner(v) for v in obj)
        elif isinstance(obj, dict):
            return type(obj)((_todict_inner(k), _todict_inner(v))
                             for k, v in obj.items())
        else:
            return copy.deepcopy(obj)
    

    Imports:

    from dataclasses import dataclass, fields, is_dataclass
    
    # thanks to Patrick Haugh
    from typing import *
    
    # deepcopy 
    import copy
    

    Functions used:

    # copy of the internal function _is_dataclass_instance
    def is_dataclass_instance(obj):
        return is_dataclass(obj) and not is_dataclass(obj.type)
    
    # the adapted version of asdict
    def todict(obj):
        if not is_dataclass_instance(obj):
             raise TypeError("todict() should be called on dataclass instances")
        return _todict_inner(obj)
    

    Tests with the example dataclasses:

    c = C([Point(0, 0), Point(10, 4)])
    
    print(c)
    cd = todict(c)
    
    print(cd)
    # {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
    
    print(cd.type)
    # <class '__main__.C'>
    

    Results are as expected.


    2. Converting back to a dataclass

    The recursive routine used by asdict can be re-used for the reverse process, with some relatively minor changes:

    def _fromdict_inner(obj):
        # reconstruct the dataclass using the type tag
        if is_dataclass_dict(obj):
            result = {}
            for name, data in obj.items():
                result[name] = _fromdict_inner(data)
            return obj.type(**result)
    
        # exactly the same as before (without the tuple clause)
        elif isinstance(obj, (list, tuple)):
            return type(obj)(_fromdict_inner(v) for v in obj)
        elif isinstance(obj, dict):
            return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
                             for k, v in obj.items())
        else:
            return copy.deepcopy(obj)
    

    Functions used:

    def is_dataclass_dict(obj):
        return isinstance(obj, TypeDict)
    
    def fromdict(obj):
        if not is_dataclass_dict(obj):
            raise TypeError("fromdict() should be called on TypeDict instances")
        return _fromdict_inner(obj)
    

    Test:

    c = C([Point(0, 0), Point(10, 4)])
    cd = todict(c)
    cf = fromdict(cd)
    
    print(c)
    # C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
    
    print(cf)
    # C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
    

    Again as expected.

    0 讨论(0)
  • 2020-12-23 00:03

    If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict() here, instead record in JSON a (safe) reference to the original dataclass.

    jsonpickle is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.

    The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass-based instances:

    import dataclasses
    import importlib
    import sys
    
    def dataclass_object_dump(ob):
        datacls = type(ob)
        if not dataclasses.is_dataclass(datacls):
            raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
        mod = sys.modules.get(datacls.__module__)
        if mod is None or not hasattr(mod, datacls.__qualname__):
            raise ValueError(f"Can't resolve '{datacls!r}' reference")
        ref = f"{datacls.__module__}.{datacls.__qualname__}"
        fields = (f.name for f in dataclasses.fields(ob))
        return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
    
    def dataclass_object_load(d):
        ref = d.pop('__dataclass__', None)
        if ref is None:
            return d
        try:
            modname, hasdot, qualname = ref.rpartition('.')
            module = importlib.import_module(modname)
            datacls = getattr(module, qualname)
            if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
                raise ValueError
            return datacls(**d)
        except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
            raise ValueError(f"Invalid dataclass reference {ref!r}") from None
    

    This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).

    Use these as the default and object_hook arguments to json.dump[s]() and json.dump[s]():

    >>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
    {
        "mylist": [
            {
                "x": 0,
                "y": 0,
                "__dataclass__": "__main__.Point"
            },
            {
                "x": 10,
                "y": 4,
                "__dataclass__": "__main__.Point"
            }
        ],
        "__dataclass__": "__main__.C"
    }
    >>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
    C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
    >>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
    True
    

    or create instances of the JSONEncoder and JSONDecoder classes with those same hooks.

    Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.

    0 讨论(0)
  • 2020-12-23 00:07

    I would like to suggest using the Composite Pattern to solve this, the main advantage is that you could continue adding classes to this pattern and have them behave the same way.

    from dataclasses import dataclass
    from typing import List
    
    
    @dataclass
    class CompositeDict:
        def as_dict(self):
            retval = dict()
            for key, value in self.__dict__.items():
                if key in self.__dataclass_fields__.keys():
                    if type(value) is list:
                        retval[key] = [item.as_dict() for item in value]
                    else:
                        retval[key] = value
            return retval
    
    @dataclass
    class Point(CompositeDict):
        x: int
        y: int
    
    
    @dataclass
    class C(CompositeDict):
        mylist: List[Point]
    
    
    c = C([Point(0, 0), Point(10, 4)])
    tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
    assert c.as_dict() == tmp
    

    as a side note, you could employ a factory pattern within the CompositeDict class that would handle other cases like nested dicts, tuples and such, which would save much boilerplate.

    0 讨论(0)
  • 2020-12-23 00:15

    I'm the author of dacite - the tool that simplifies creation of data classes from dictionaries.

    This library has only one function from_dict - this is a quick example of usage:

    from dataclasses import dataclass
    from dacite import from_dict
    
    @dataclass
    class User:
        name: str
        age: int
        is_active: bool
    
    data = {
        'name': 'john',
        'age': 30,
        'is_active': True,
    }
    
    user = from_dict(data_class=User, data=data)
    
    assert user == User(name='john', age=30, is_active=True)
    

    Moreover dacite supports following features:

    • nested structures
    • (basic) types checking
    • optional fields (i.e. typing.Optional)
    • unions
    • collections
    • values casting and transformation
    • remapping of fields names

    ... and it's well tested - 100% code coverage!

    To install dacite, simply use pip (or pipenv):

    $ pip install dacite
    
    0 讨论(0)
  • 2020-12-23 00:15

    All it takes is a five-liner:

    def dataclass_from_dict(klass, d):
        try:
            fieldtypes = {f.name:f.type for f in dataclasses.fields(klass)}
            return klass(**{f:dataclass_from_dict(fieldtypes[f],d[f]) for f in d})
        except:
            return d # Not a dataclass field
    

    Sample usage:

    from dataclasses import dataclass, asdict
    
    @dataclass
    class Point:
        x: float
        y: float
    
    @dataclass
    class Line:
        a: Point
        b: Point
    
    line = Line(Point(1,2), Point(3,4))
    assert line == dataclass_from_dict(Line, asdict(line))
    

    Full code, including to/from json, here at gist: https://gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22

    0 讨论(0)
提交回复
热议问题