问题
Suppose I have a namedtuple
like this:
EdgeBase = namedtuple("EdgeBase", "left, right")
I want to implement a custom hash-function for this, so I create the following subclass:
class Edge(EdgeBase):
def __hash__(self):
return hash(self.left) * hash(self.right)
Since the object is immutable, I want the hash-value to be calculated only once, so I do this:
class Edge(EdgeBase):
def __init__(self, left, right):
self._hash = hash(self.left) * hash(self.right)
def __hash__(self):
return self._hash
This appears to be working, but I am really not sure about subclassing and initialization in Python, especially with tuples. Are there any pitfalls to this solution? Is there a recommended way how to do this? Is it fine? Thanks in advance.
回答1:
edit for 2017: turns out namedtuple isn't a great idea. attrs is the modern alternative.
class Edge(EdgeBase):
def __new__(cls, left, right):
self = super(Edge, cls).__new__(cls, left, right)
self._hash = hash(self.left) * hash(self.right)
return self
def __hash__(self):
return self._hash
__new__
is what you want to call here because tuples are immutable. Immutable objects are created in __new__
and then returned to the user, instead of being populated with data in __init__
.
cls
has to be passed twice to the super
call on __new__
because __new__
is, for historical/odd reasons implicitly a staticmethod
.
回答2:
The code in the question could benefit from a super call in the __init__
in case it ever gets subclassed in a multiple inheritance situation, but otherwise is correct.
class Edge(EdgeBase):
def __init__(self, left, right):
super(Edge, self).__init__(left, right)
self._hash = hash(self.left) * hash(self.right)
def __hash__(self):
return self._hash
While tuples are readonly only the tuple parts of their subclasses are readonly, other properties may be written as usual which is what allows the assignment to _hash regardless of whether it's done in __init__
or __new__
. You can make the subclass fully readonly by setting it's __slots__
to (), which has the added benefit of saving memory, but then you wouldn't be able to assign to _hash.
回答3:
In Python 3.7+, you can now use dataclasses to build hashable classes with ease.
Code
Assuming int
types of left
and right
, we use the default hashing via unsafe_hash
+ keyword:
import dataclasses as dc
@dc.dataclass(unsafe_hash=True)
class Edge:
left: int
right: int
hash(Edge(1, 2))
# 3713081631934410656
Now we can use these (mutable) hashable objects as elements in a set or (keys in a dict).
{Edge(1, 2), Edge(1, 2), Edge(2, 1), Edge(2, 3)}
# {Edge(left=1, right=2), Edge(left=2, right=1), Edge(left=2, right=3)}
Details
We can alternatively override the __hash__
function:
@dc.dataclass
class Edge:
left: int
right: int
def __post_init__(self):
# Add custom hashing function here
self._hash = hash((self.left, self.right)) # emulates default
def __hash__(self):
return self._hash
hash(Edge(1, 2))
# 3713081631934410656
Expanding on @ShadowRanger's comment, the OP's custom hash function is not reliable. In particular, the attribute values can be interchanged, e.g. hash(Edge(1, 2)) == hash(Edge(2, 1))
, which is likely not what is intended.
+Note, the name "unsafe" suggests the default hash will be used despite object being mutable. This may be undesired, particularly in dict expecting immutable keys. Immutable hashing can be turned on with the appropriate keywords. See also more on hashing logic in dataclasses and a related issue.
来源:https://stackoverflow.com/questions/3624753/how-to-provide-additional-initialization-for-a-subclass-of-namedtuple