How to provide additional initialization for a subclass of namedtuple?

问题

Suppose I have a namedtuple like this:

EdgeBase = namedtuple("EdgeBase", "left, right")

I want to implement a custom hash-function for this, so I create the following subclass:

class Edge(EdgeBase):
    def __hash__(self):
        return hash(self.left) * hash(self.right)

Since the object is immutable, I want the hash-value to be calculated only once, so I do this:

class Edge(EdgeBase):
    def __init__(self, left, right):
        self._hash = hash(self.left) * hash(self.right)

    def __hash__(self):
        return self._hash

This appears to be working, but I am really not sure about subclassing and initialization in Python, especially with tuples. Are there any pitfalls to this solution? Is there a recommended way how to do this? Is it fine? Thanks in advance.

回答1:

edit for 2017: turns out namedtuple isn't a great idea. attrs is the modern alternative.

class Edge(EdgeBase):
    def __new__(cls, left, right):
        self = super(Edge, cls).__new__(cls, left, right)
        self._hash = hash(self.left) * hash(self.right)
        return self

    def __hash__(self):
        return self._hash

__new__ is what you want to call here because tuples are immutable. Immutable objects are created in __new__ and then returned to the user, instead of being populated with data in __init__.

cls has to be passed twice to the super call on __new__ because __new__ is, for historical/odd reasons implicitly a staticmethod.

回答2:

The code in the question could benefit from a super call in the __init__ in case it ever gets subclassed in a multiple inheritance situation, but otherwise is correct.

class Edge(EdgeBase):
    def __init__(self, left, right):
        super(Edge, self).__init__(left, right)
        self._hash = hash(self.left) * hash(self.right)

    def __hash__(self):
        return self._hash

While tuples are readonly only the tuple parts of their subclasses are readonly, other properties may be written as usual which is what allows the assignment to _hash regardless of whether it's done in __init__ or __new__. You can make the subclass fully readonly by setting it's __slots__ to (), which has the added benefit of saving memory, but then you wouldn't be able to assign to _hash.

回答3:

In Python 3.7+, you can now use dataclasses to build hashable classes with ease.

Code

Assuming int types of left and right, we use the default hashing via unsafe_hash⁺ keyword:

import dataclasses as dc


@dc.dataclass(unsafe_hash=True)
class Edge:
    left: int
    right: int


hash(Edge(1, 2))
# 3713081631934410656

Now we can use these (mutable) hashable objects as elements in a set or (keys in a dict).

{Edge(1, 2), Edge(1, 2), Edge(2, 1), Edge(2, 3)}
# {Edge(left=1, right=2), Edge(left=2, right=1), Edge(left=2, right=3)}

Details

We can alternatively override the __hash__ function:

@dc.dataclass
class Edge:
    left: int
    right: int

    def __post_init__(self):
        # Add custom hashing function here
        self._hash = hash((self.left, self.right))         # emulates default

    def __hash__(self):
        return self._hash


hash(Edge(1, 2))
# 3713081631934410656

Expanding on @ShadowRanger's comment, the OP's custom hash function is not reliable. In particular, the attribute values can be interchanged, e.g. hash(Edge(1, 2)) == hash(Edge(2, 1)), which is likely not what is intended.

_{⁺Note, the name "unsafe" suggests the default hash will be used despite object being mutable. This may be undesired, particularly in dict expecting immutable keys. Immutable hashing can be turned on with the appropriate keywords. See also more on hashing logic in dataclasses and a related issue.}

来源：https://stackoverflow.com/questions/3624753/how-to-provide-additional-initialization-for-a-subclass-of-namedtuple

标签

python

inheritance

tuples