Sequentially Constructing Full B-Trees

不问归期 提交于 2020-01-24 13:06:15

问题


If I have a sorted set of data, which I want to store on disk in a way that is optimal for both reading sequentially and doing random lookups on, it seems that a B-Tree (or one of the variants is a good choice ... presuming this data-set does not all fit in RAM).

The question is can a full B-Tree be constructed from a sorted set of data without doing any page splits? So that the sorted data can be written to disk sequentially.


回答1:


Constructing a "B+ tree" to those specifications is simple.

  1. Choose your branching factor k.
  2. Write the sorted data to a file. This is the leaf level.
  3. To construct the next highest level, scan the current level and write out every kth item.
  4. Stop when the current level has k items or fewer.

Example with k = 2:

0 1|2 3|4 5|6 7|8 9
0   2  |4   6  |8
0       4      |8
0               8

Now let's look for 5. Use binary search to find the last number less than or equal to 5 in the top level, or 0. Look at the interval in the next lowest level corresponding to 0:

0       4

Now 4:

        4   6

Now 4 again:

        4 5

Found it. In general, the jth item corresponds to items jk though (j+1)k-1 at the next level. You can also scan the leaf level linearly.




回答2:


We can make a B-tree in one pass, but it may not be the optimal storage method. Depending on how often you make sequential queries vs random access ones, it may be better to store it in sequence and use binary search to service a random access query.

That said: assume that each record in your b-tree holds (m - 1) keys (m > 2, the binary case is a bit different). We want all the leaves on the same level and all the internal nodes to have at least (m - 1) / 2 keys. We know that a full b-tree of height k has (m^k - 1) keys. Assume that we have n keys total to store. Let k be the smallest integer such that m^k - 1 > n. Now if 2 m^(k - 1) - 1 < n we can completely fill up the inner nodes, and distribute the rest of the keys evenly to the leaf nodes, each leaf node getting either the floor or ceiling of (n + 1 - m^(k - 1))/m^(k - 1) keys. If we cannot do that then we know that we have enough to fill all of the nodes at depth k - 1 at least halfway and store one key in each of the leaves.

Once we have decided the shape of our tree, we need only do an inorder traversal of the tree sequentially dropping keys into position as we go.




回答3:


Optimal meaning that an inorder traversal of the data will always be seeking forward through the file (or mmaped region), and a random lookup is done in a minimal number of seeks.



来源:https://stackoverflow.com/questions/3401009/sequentially-constructing-full-b-trees

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!