In what order should you insert a set of known keys into a B-Tree to get minimal height?

前端 未结 5 1632
-上瘾入骨i
-上瘾入骨i 2020-12-23 17:31

Given a fixed number of keys or values(stored either in array or in some data structure) and order of b-tree, can we determine the sequence of inserting keys that would gene

5条回答
  •  孤独总比滥情好
    2020-12-23 18:15

    So is there a particular way to determine sequence of insertion which would reduce space consumption?

    Edit note: since the question was quite interesting, I try to improve my answer with a bit of Haskell.

    Let k be the Knuth order of the B-Tree and list a list of keys

    The minimization of space consumption has a trivial solution:

    -- won't use point free notation to ease haskell newbies
    trivial k list = concat $ reverse $ chunksOf (k-1) $ sort list
    

    Such algorithm will efficiently produce a time-inefficient B-Tree, unbalanced on the left but with minimal space consumption.

    A lot of non trivial solutions exist that are less efficient to produce but show better lookup performance (lower height/depth). As you know, it's all about trade-offs!

    A simple algorithm that minimizes both the B-Tree depth and the space consumption (but it doesn't minimize lookup performance!), is the following

    -- Sort the list in increasing order and call sortByBTreeSpaceConsumption 
    -- with the result
    smart k list = sortByBTreeSpaceConsumption k $ sort list
    
    -- Sort list so that inserting in a B-Tree with Knuth order = k 
    -- will produce a B-Tree  with minimal space consumption minimal depth 
    -- (but not best performance)
    sortByBTreeSpaceConsumption :: Ord a => Int -> [a] -> [a]
    sortByBTreeSpaceConsumption _ [] = []
    sortByBTreeSpaceConsumption k list
        | k - 1 >= numOfItems = list  -- this will be a leaf
        | otherwise = heads ++ tails ++ sortByBTreeSpaceConsumption k remainder
        where requiredLayers = minNumberOfLayersToArrange k list
              numOfItems = length list
              capacityOfInnerLayers = capacityOfBTree k $ requiredLayers - 1
              blockSize = capacityOfInnerLayers + 1 
              blocks = chunksOf blockSize balanced
              heads = map last blocks
              tails = concat $ map (sortByBTreeSpaceConsumption k . init) blocks
              balanced = take (numOfItems - (mod numOfItems blockSize)) list
              remainder = drop (numOfItems - (mod numOfItems blockSize)) list
    
    -- Capacity of a layer n in a B-Tree with Knuth order = k
    layerCapacity k 0 = k - 1
    layerCapacity k n = k * layerCapacity k (n - 1)
    
    -- Infinite list of capacities of layers in a B-Tree with Knuth order = k
    capacitiesOfLayers k = map (layerCapacity k) [0..]
    
    -- Capacity of a B-Tree with Knut order = k and l layers
    capacityOfBTree k l = sum $ take l $ capacitiesOfLayers k
    
    -- Infinite list of capacities of B-Trees with Knuth order = k 
    -- as the number of layers increases
    capacitiesOfBTree k = map (capacityOfBTree k) [1..]
    
    -- compute the minimum number of layers in a B-Tree of Knuth order k 
    -- required to store the items in list
    minNumberOfLayersToArrange k list = 1 + f k
        where numOfItems = length list
              f = length . takeWhile (< numOfItems) . capacitiesOfBTree
    

    With this smart function given a list = [21, 18, 16, 9, 12, 7, 6, 5, 1, 2] and a B-Tree with knuth order = 3 we should obtain [18, 5, 9, 1, 2, 6, 7, 12, 16, 21] with a resulting B-Tree like

                  [18, 21]
                 /
          [5 , 9]
         /   |   \
     [1,2] [6,7] [12, 16]
    

    Obviously this is suboptimal from a performance point of view, but should be acceptable, since obtaining a better one (like the following) would be far more expensive (computationally and economically):

              [7 , 16]
             /   |   \
         [5,6] [9,12] [18, 21]
        /
    [1,2]
    

    If you want to run it, compile the previous code in a Main.hs file and compile it with ghc after prepending

    import Data.List (sort)
    import Data.List.Split
    import System.Environment (getArgs)
    
    main = do
        args <- getArgs
        let knuthOrder = read $ head args
        let keys = (map read $ tail args) :: [Int]
        putStr "smart: "
        putStrLn $ show $ smart knuthOrder keys
        putStr "trivial: "
        putStrLn $ show $ trivial knuthOrder keys
    

提交回复
热议问题