How to represent a kademlia routing table as data structure

南楼画角 提交于 2019-12-29 01:25:33

问题


The kademlia paper talks about the the organization of buckets, splitting, merging and finding the correct bucket to insert in abstract, concise and confusing terms.

§2.2 talks about a fixed set of 160 buckets with each bucket covering a fixed subset of the keyspace. But later chapters involve additional splitting and buckets covering different parts of the keyspace. That don't fit well into a fixed list

What is the correct way to organize buckets?

Meta: Since the confusion is reflected in many questions and partial information has been scattered over many answers this Q&A are intended to provide an easily linked clarification


回答1:


The confusion stems from different versions of the paper.

Flat layout

This is from the pre-print version and mostly used to outline basic properties of kademlia in a theoretical manner and still reflected in §2.2 and §3 of the full version.

Many real-world implementations implement this approach but they don't implement bucket splitting, merging or node multihoming.

It involves putting contacts into the ith bucket that shares i prefix bits with the node. Which means the layout uses distances relative to the node's own ID.

Tree-based layout

This is described in section §2.4.

To implement refinements such as handling highly unbalanced trees described towards the end of §2.4 or deeper non-local splitting described in §4.2 one needs to associate each bucket with the keyspace range it covers, this can be expressed similar to CIDR ranges, i.e. a start ID and the number of prefix bits shared to mask off the tail of the ID.

Splitting a bucket is performed by increasing the number of prefix bits by one and setting the added bit to 0 and 1 respectively for two new buckets.

Unlike the flat layout this structure does not involve distances relative to the node's own ID, although some decisions are based on whether the node's own ID would fall into a bucket.

Since the number of buckets in such a routing table varies over time it has to represented in a resizable data structure, this is mentioned in §2.4. Since access can't be done by a fixed index anymore since the exact bucket that will cover any specific node ID is not known until the prefix-ranges have been examined some kind of O(log n) search is needed if one wants to avoid scanning the whole bucket list each time.
Sorting the buckets by the lowest ID that the bucket would cover is a natural approach to achieve this. BTrees or sorted arrays combined with binary search can be used to achieve this.

Regardless which approach you take, populating a response to find_node requests with the correct set of contacts that match the request's target is not trivial since any single bucket may be insufficient to fill it and thus multiple buckets need to be traversed. It may be simpler to scan the whole routing table for the best available candidates for the reply.




回答2:


After some online research and re-reading the paper a few times I think I got it. In the final version of the paper somewhere in section 2 (System description) it says:

The remainder of this section fills in the details and makes the lookup algorithm more concrete. We first define a precise notion of ID closeness, allowing us to speak of storing and looking up pairs on the k closest nodes to the key. We then give a lookup protocol that works even in cases where no node shares a unique prefix with a key or some of the subtrees associated with a given node are empty


The part of defining "a precise notion of ID closeness" is done in subsection 2.1. So this "allows" us in subsection 2.2 & 2.3 to speak of "storing and looking up pairs on the k closest nodes to the key" and we will get to know how the lookup procedure works. Section 2.4 will now address the issue of looking up in cases where no node shares a unique prefix with a key (aka unbalanced trees) and here is the actual "lookup protocol" completed.

So the array structure is used in the beginning as a dummy-strucuture that we use to understand the lookup procedure and after getting an idea how the lookup procedure works we are introduced to a more advanced tree structure.
That's what the authors had probably in mind.



来源:https://stackoverflow.com/questions/51161731/how-to-represent-a-kademlia-routing-table-as-data-structure

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!