Hashing a Tree Structure

后端未结

关注

 11  2018

渐次进展 2020-11-28 04:00

I\'ve just come across a scenario in my project where it I need to compare different tree objects for equality with already known instances, and have considered that some so

11条回答

时光说笑 (楼主)

2020-11-28 04:33
I have to say, that you requirements are somewhat against the entire concept of hashcodes.

Hash function computational complexity should be very limited.

It's computational complexity should not linearly depend on the size of the container (the tree), otherwise it totally breaks the hashcode-based algorithms.

Considering the position as a major property of the nodes hash function also somewhat goes against the concept of the tree, but achievable, if you replace the requirement, that it HAS to depend on the position.

Overall principle i would suggest, is replacing MUST requirements with SHOULD requirements. That way you can come up with appropriate and efficient algorithm.

For example, consider building a limited sequence of integer hashcode tokens, and add what you want to this sequence, in the order of preference.

Order of the elements in this sequence is important, it affects the computed value.

for example for each node you want to compute:
1. add the hashcode of underlying object
2. add the hashcodes of underlying objects of the nearest siblings, if available. I think, even the single left sibling would be enough.
3. add the hashcode of underlying object of the parent and it's nearest siblings like for the node itself, same as 2.
4. repeat this to with the grandparents to a limited depth.
```
//--------5------- ancestor depth 2 and it's left sibling;
//-------/|------- ;
//------4-3------- ancestor depth 1 and it's left sibling;    
//-------/|------- ;
//------2-1------- this;
```
  the fact that you are adding a direct sibling's underlying object's hashcode gives a positional property to the hashfunction.
  
  if this is not enough, add the children: You should add every child, just some to give a decent hashcode.
5. add the first child and it's first child and it's first child.. limit the depth to some constant, and do not compute anything recursively - just the underlying node's object's hashcode.
```
//----- this;
//-----/--;
//----6---;
//---/--;
//--7---;
```
This way the complexity is linear to the depth of the underlying tree, not the total number of elements.

Now you have a sequence if integers, combine them with a known algorithm, like Ely suggests above.

1,2,...7

This way, you will have a lightweight hash function, with a positional property, not dependent on the total size of the tree, and even not dependent on the tree depth, and not requiring to recompute hash function of the entire tree when you change the tree structure.

I bet this 7 numbers would give a hash destribution near to perfect.
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...