Algorithm to find k-th key in a B-tree?

后端 未结 3 1300
[愿得一人]
[愿得一人] 2020-12-18 15:22

I\'m trying to understand how I should think about getting the k-th key/element in a B-tree. Even if it\'s steps instead of code, it will still help a lot. Thanks

Ed

相关标签:
3条回答
  • 2020-12-18 15:48

    Ok so, after a few sleepless hours I managed to do it, and for anyone who will wonder how, here it goes in pseudocode (k=0 for first element):

    get_k-th(current, k):
    
    for i = 0 to current.number_of_children_nodes
        int size = size_of_B-tree(current.child[i])
        if(k <= size-1)
            return get_k-th(current.child[i], k)
        else if(k == size && i < current.number_of_children_nodes)
            return current.key[i]
        else if (is_leaf_node(current) && k < current.number_of_children_nodes)
            return node.key[k]
        k = k - size - 1;
    
    return null
    

    I know this might look kinda weird, but it's what I came up with and thankfully it works. There might be a way to make this code clearer, and probably more efficient, but I hope it's good enough to help anyone else who might get stuck on the same obstacle as I did.

    0 讨论(0)
  • 2020-12-18 15:54

    There's no efficient way to do it using a standard B-tree. Broadly speaking, I see 2 options:

    • Convert the B-tree to an order statistic tree to allow for this operation in O(log n).

      That is, for each node, keep a variable representing the size (number of elements) of the subtree rooted at that node (that node, all its children, all its children's children, etc.).

      Whenever you do an insertion or deletion, you update this variable appropriately. You will only need to update nodes already being visited, so it won't change the complexity of those operations.

      Getting the k-th element would involve adding up the sizes of the children until we get to k, picking the appropriate child to visit and decreasing k appropriately. Pseudo-code:

      select(root, k) // initial call for root
      
      // returns the k'th element of the elements in node
      function select(node, k)
         for i = 0 to t.elementCount
            size = 0
            if node.child[i] != null
               size = node.sizeOfChild[i]
            if k < size // element is in the child subtree
               return select(node.child[i], k)
            else if k == size // element is here
                     && i != t.elementCount // only equal when k == elements in tree, i.e. k is not valid
               return t.element[i]
            else // k > size, element is to the right
               k -= size + 1 // child[i] subtree + t.element[i]
         return null // k > elements in tree
      

      Consider child[i] to be directly to the left of element[i].

      The pseudo-code for the binary search tree (not B-tree) provided on Wikipedia may explain the basic concept here better than the above.

      Note that the size of a node's subtree should be store in its parent (note that I didn't use node.child[i].size above). Storing it in the node itself will be much less efficient, as reading nodes is considered a non-trivial or expensive operation for B-tree use cases (nodes must often be read from disk), thus you want to minimise the number of nodes read, even if that would make each node slightly bigger.

    • Do an in-order traversal until you've seen k elements - this will take O(n).

      Pseudo-code:

      select(root, *k) // initial call for root
      
      // returns the k'th element of the elements in node
      function select(node, *k) // pass k by pointer, allowing global update
         if node == null
            return null
         for i = 0 to t.elementCount
            element = select(node.child[i], k) // check if it's in the child's subtree
            if element != null // element was found
               return element
            if i != t.elementCount // exclude last iteration
               if k == 0 // element is here
                  return t.element[i]
               (*k)-- // only decrease k for t.element[i] (i.e. by 1),
                      // k is decreased for node.child[i] in the recursive call 
         return null
      
    0 讨论(0)
  • 2020-12-18 16:07

    You can use a new balanced binary search tree(like Splay or just using std::set) to record what elements are currently in the B-Tree. This will allow every operation to finish in O(logn), and its quite easy to implement(when using std::set) but will double the space cost.

    0 讨论(0)
提交回复
热议问题