How to “perfectly” override a dict?

后端 未结 5 2005
情话喂你
情话喂你 2020-11-22 08:14

How can I make as \"perfect\" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

It would seem

5条回答
  •  孤独总比滥情好
    2020-11-22 08:49

    How can I make as "perfect" a subclass of dict as possible?

    The end goal is to have a simple dict in which the keys are lowercase.

    • If I override __getitem__/__setitem__, then get/set don't work. How do I make them work? Surely I don't need to implement them individually?

    • Am I preventing pickling from working, and do I need to implement __setstate__ etc?

    • Do I need repr, update and __init__?

    • Should I just use mutablemapping (it seems one shouldn't use UserDict or DictMixin)? If so, how? The docs aren't exactly enlightening.

    The accepted answer would be my first approach, but since it has some issues, and since no one has addressed the alternative, actually subclassing a dict, I'm going to do that here.

    What's wrong with the accepted answer?

    This seems like a rather simple request to me:

    How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

    The accepted answer doesn't actually subclass dict, and a test for this fails:

    >>> isinstance(MyTransformedDict([('Test', 'test')]), dict)
    False
    

    Ideally, any type-checking code would be testing for the interface we expect, or an abstract base class, but if our data objects are being passed into functions that are testing for dict - and we can't "fix" those functions, this code will fail.

    Other quibbles one might make:

    • The accepted answer is also missing the classmethod: fromkeys.
    • The accepted answer also has a redundant __dict__ - therefore taking up more space in memory:

      >>> s.foo = 'bar'
      >>> s.__dict__
      {'foo': 'bar', 'store': {'test': 'test'}}
      

    Actually subclassing dict

    We can reuse the dict methods through inheritance. All we need to do is create an interface layer that ensures keys are passed into the dict in lowercase form if they are strings.

    If I override __getitem__/__setitem__, then get/set don't work. How do I make them work? Surely I don't need to implement them individually?

    Well, implementing them each individually is the downside to this approach and the upside to using MutableMapping (see the accepted answer), but it's really not that much more work.

    First, let's factor out the difference between Python 2 and 3, create a singleton (_RaiseKeyError) to make sure we know if we actually get an argument to dict.pop, and create a function to ensure our string keys are lowercase:

    from itertools import chain
    try:              # Python 2
        str_base = basestring
        items = 'iteritems'
    except NameError: # Python 3
        str_base = str, bytes, bytearray
        items = 'items'
    
    _RaiseKeyError = object() # singleton for no-default behavior
    
    def ensure_lower(maybe_str):
        """dict keys can be any hashable object - only call lower if str"""
        return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str
    

    Now we implement - I'm using super with the full arguments so that this code works for Python 2 and 3:

    class LowerDict(dict):  # dicts take a mapping or iterable as their optional first argument
        __slots__ = () # no __dict__ - that would be redundant
        @staticmethod # because this doesn't make sense as a global function.
        def _process_args(mapping=(), **kwargs):
            if hasattr(mapping, items):
                mapping = getattr(mapping, items)()
            return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
        def __init__(self, mapping=(), **kwargs):
            super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
        def __getitem__(self, k):
            return super(LowerDict, self).__getitem__(ensure_lower(k))
        def __setitem__(self, k, v):
            return super(LowerDict, self).__setitem__(ensure_lower(k), v)
        def __delitem__(self, k):
            return super(LowerDict, self).__delitem__(ensure_lower(k))
        def get(self, k, default=None):
            return super(LowerDict, self).get(ensure_lower(k), default)
        def setdefault(self, k, default=None):
            return super(LowerDict, self).setdefault(ensure_lower(k), default)
        def pop(self, k, v=_RaiseKeyError):
            if v is _RaiseKeyError:
                return super(LowerDict, self).pop(ensure_lower(k))
            return super(LowerDict, self).pop(ensure_lower(k), v)
        def update(self, mapping=(), **kwargs):
            super(LowerDict, self).update(self._process_args(mapping, **kwargs))
        def __contains__(self, k):
            return super(LowerDict, self).__contains__(ensure_lower(k))
        def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
            return type(self)(self)
        @classmethod
        def fromkeys(cls, keys, v=None):
            return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
        def __repr__(self):
            return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__())
    

    We use an almost boiler-plate approach for any method or special method that references a key, but otherwise, by inheritance, we get methods: len, clear, items, keys, popitem, and values for free. While this required some careful thought to get right, it is trivial to see that this works.

    (Note that haskey was deprecated in Python 2, removed in Python 3.)

    Here's some usage:

    >>> ld = LowerDict(dict(foo='bar'))
    >>> ld['FOO']
    'bar'
    >>> ld['foo']
    'bar'
    >>> ld.pop('FoO')
    'bar'
    >>> ld.setdefault('Foo')
    >>> ld
    {'foo': None}
    >>> ld.get('Bar')
    >>> ld.setdefault('Bar')
    >>> ld
    {'bar': None, 'foo': None}
    >>> ld.popitem()
    ('bar', None)
    

    Am I preventing pickling from working, and do I need to implement __setstate__ etc?

    pickling

    And the dict subclass pickles just fine:

    >>> import pickle
    >>> pickle.dumps(ld)
    b'\x80\x03c__main__\nLowerDict\nq\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.'
    >>> pickle.loads(pickle.dumps(ld))
    {'foo': None}
    >>> type(pickle.loads(pickle.dumps(ld)))
    
    

    __repr__

    Do I need repr, update and __init__?

    We defined update and __init__, but you have a beautiful __repr__ by default:

    >>> ld # without __repr__ defined for the class, we get this
    {'foo': None}
    

    However, it's good to write a __repr__ to improve the debugability of your code. The ideal test is eval(repr(obj)) == obj. If it's easy to do for your code, I strongly recommend it:

    >>> ld = LowerDict({})
    >>> eval(repr(ld)) == ld
    True
    >>> ld = LowerDict(dict(a=1, b=2, c=3))
    >>> eval(repr(ld)) == ld
    True
    

    You see, it's exactly what we need to recreate an equivalent object - this is something that might show up in our logs or in backtraces:

    >>> ld
    LowerDict({'a': 1, 'c': 3, 'b': 2})
    

    Conclusion

    Should I just use mutablemapping (it seems one shouldn't use UserDict or DictMixin)? If so, how? The docs aren't exactly enlightening.

    Yeah, these are a few more lines of code, but they're intended to be comprehensive. My first inclination would be to use the accepted answer, and if there were issues with it, I'd then look at my answer - as it's a little more complicated, and there's no ABC to help me get my interface right.

    Premature optimization is going for greater complexity in search of performance. MutableMapping is simpler - so it gets an immediate edge, all else being equal. Nevertheless, to lay out all the differences, let's compare and contrast.

    I should add that there was a push to put a similar dictionary into the collections module, but it was rejected. You should probably just do this instead:

    my_dict[transform(key)]
    

    It should be far more easily debugable.

    Compare and contrast

    There are 6 interface functions implemented with the MutableMapping (which is missing fromkeys) and 11 with the dict subclass. I don't need to implement __iter__ or __len__, but instead I have to implement get, setdefault, pop, update, copy, __contains__, and fromkeys - but these are fairly trivial, since I can use inheritance for most of those implementations.

    The MutableMapping implements some things in Python that dict implements in C - so I would expect a dict subclass to be more performant in some cases.

    We get a free __eq__ in both approaches - both of which assume equality only if another dict is all lowercase - but again, I think the dict subclass will compare more quickly.

    Summary:

    • subclassing MutableMapping is simpler with fewer opportunities for bugs, but slower, takes more memory (see redundant dict), and fails isinstance(x, dict)
    • subclassing dict is faster, uses less memory, and passes isinstance(x, dict), but it has greater complexity to implement.

    Which is more perfect? That depends on your definition of perfect.

提交回复
热议问题