Why use binary search if there's ternary search?

前端 未结 15 1672
别跟我提以往
别跟我提以往 2020-12-02 12:43

I recently heard about ternary search in which we divide an array into 3 parts and compare. Here there will be two comparisons but it reduces the array to n/3. Why don\'t p

相关标签:
15条回答
  • 2020-12-02 12:44

    Almost all textbooks and websites on binary search trees do not really talk about binary trees! They show you ternary search trees! True binary trees store data in their leaves not internal nodes (except for keys to navigate). Some call these leaf trees and make the distinction between node trees shown in textbooks:

    J. Nievergelt, C.-K. Wong: Upper Bounds for the Total Path Length of Binary Trees, Journal ACM 20 (1973) 1–6.

    The following about this is from Peter Brass's book on data structures.

    2.1 Two Models of Search Trees

    In the outline just given, we supressed an important point that at first seems trivial, but indeed it leads to two different models of search trees, either of which can be combined with much of the following material, but one of which is strongly preferable.

    If we compare in each node the query key with the key contained in the node and follow the left branch if the query key is smaller and the right branch if the query key is larger, then what happens if they are equal? The two models of search trees are as follows:

    1. Take left branch if query key is smaller than node key; otherwise take the right branch, until you reach a leaf of the tree. The keys in the interior node of the tree are only for comparison; all the objects are in the leaves.

    2. Take left branch if query key is smaller than node key; take the right branch if the query key is larger than the node key; and take the object contained in the node if they are equal.

    This minor point has a number of consequences:

    { In model 1, the underlying tree is a binary tree, whereas in model 2, each tree node is really a ternary node with a special middle neighbor.

    { In model 1, each interior node has a left and a right subtree (each possibly a leaf node of the tree), whereas in model 2, we have to allow incomplete nodes, where left or right subtree might be missing, and only the comparison object and key are guaranteed to exist.

    So the structure of a search tree of model 1 is more regular than that of a tree of model 2; this is, at least for the implementation, a clear advantage.

    { In model 1, traversing an interior node requires only one comparison, whereas in model 2, we need two comparisons to check the three possibilities.

    Indeed, trees of the same height in models 1 and 2 contain at most approximately the same number of objects, but one needs twice as many comparisons in model 2 to reach the deepest objects of the tree. Of course, in model 2, there are also some objects that are reached much earlier; the object in the root is found with only two comparisons, but almost all objects are on or near the deepest level.

    Theorem. A tree of height h and model 1 contains at most 2^h objects. A tree of height h and model 2 contains at most 2^h+1 − 1 objects.

    This is easily seen because the tree of height h has as left and right subtrees a tree of height at most h − 1 each, and in model 2 one additional object between them.

    { In model 1, keys in interior nodes serve only for comparisons and may reappear in the leaves for the identification of the objects. In model 2, each key appears only once, together with its object.

    It is even possible in model 1 that there are keys used for comparison that do not belong to any object, for example, if the object has been deleted. By conceptually separating these functions of comparison and identification, this is not surprising, and in later structures we might even need to define artificial tests not corresponding to any object, just to get a good division of the search space. All keys used for comparison are necessarily distinct because in a model 1 tree, each interior node has nonempty left and right subtrees. So each key occurs at most twice, once as comparison key and once as identification key in the leaf.

    Model 2 became the preferred textbook version because in most textbooks the distinction between object and its key is not made: the key is the object. Then it becomes unnatural to duplicate the key in the tree structure. But in all real applications, the distinction between key and object is quite important. One almost never wishes to keep track of just a set of numbers; the numbers are normally associated with some further information, which is often much larger than the key itself.

    0 讨论(0)
  • 2020-12-02 12:44

    Although you get the same big-O complexity (ln n) in both search trees, the difference is in the constants. You have to do more comparisons for a ternary search tree at each level. So the difference boils down to k/ln(k) for a k-ary search tree. This has a minimum value at e=2.7 and k=2 provides the optimal result.

    0 讨论(0)
  • 2020-12-02 12:46

    Searching 1 billion (a US billion - 1,000,000,000) sorted items would take an average of about 15 compares with binary search and about 9 compares with a ternary search - not a huge advantage. And note that each 'ternary compare' might involve 2 actual comparisons.

    0 讨论(0)
  • 2020-12-02 12:48

    What makes you think Ternary search should be faster?

    Average number of comparisons:

    in ternary search = ((1/3)*1 + (2/3)*2) * ln(n)/ln(3) ~ 1.517*ln(n)
    in binary search  =                   1 * ln(n)/ln(2) ~ 1.443*ln(n).
    

    Worst number of comparisons:

    in ternary search = 2 * ln(n)/ln(3) ~ 1.820*ln(n)
    in binary search  = 1 * ln(n)/ln(2) ~ 1.443*ln(n).
    

    So it looks like ternary search is worse.

    0 讨论(0)
  • 2020-12-02 12:52

    Wow. The top voted answers miss the boat on this one, I think.

    Your CPU doesn't support ternary logic as a single operation; it breaks ternary logic into several steps of binary logic. The most optimal code for the CPU is binary logic. If chips were common that supported ternary logic as a single operation, you'd be right.

    B-Trees can have multiple branches at each node; a order-3 B-tree is ternary logic. Each step down the tree will take two comparisons instead of one, and this will probably cause it to be slower in CPU time.

    B-Trees, however, are pretty common. If you assume that every node in the tree will be stored somewhere separately on disk, you're going to spend most of your time reading from disk... and the CPU won't be a bottleneck, but the disk will be. So you take a B-tree with 100,000 children per node, or whatever else will barely fit into one block of memory. B-trees with that kind of branching factor would rarely be more than three nodes high, and you'd only have three disk reads - three stops at a bottleneck - to search an enormous, enormous dataset.

    Reviewing:

    • Ternary trees aren't supported by hardware, so they run less quickly.
    • B-tress with orders much, much, much higher than 3 are common for disk-optimization of large datasets; once you've gone past 2, go higher than 3.
    0 讨论(0)
  • 2020-12-02 12:52

    Also, note that this sequence generalizes to linear search if we go on

    Binary search
    Ternary search
    ...
    ...
    n-ary search ≡ linear search
    

    So, in an n-ary search, we will have "one only COMPARE" which might take upto n actual comparisons.

    0 讨论(0)
提交回复
热议问题