Perform Joins in O(n) time?

问题

is there a way to Join 2 tables in linear time? I heard this can be done by having another data structure (Hashtable), but I'm not sure how this can be done. I was always wondering a Join will involve a cross-product and hence it is O(n^2).

回答1:

Algorithm:

Loop through table A. Hash all Items, Add them to the Join array.
Loop through table B, check each item if it's in the hash table (Check - O(1)), if not, add to the Join table.

回答2:

If there are indexes available on columns used in the join, it's linear because the indexes allow an in-order traversal of both tables. (That's not counting the amortized index cost, of course.)

A hash join will be sort-of linear, though the hashing itself isn't free, and when the keys involved are long then the costs also go up.

回答3:

It depends on the type of join. A cross join is always going to be O(n^2) since it has to produce O(n^2) records. An equi-join can be done with better complexity (O(n log(n)) or perhaps even amortized O(n)), provided right data structures are employed.

回答4:

You can join two tables in close to O(n) by using a hash table to look up records in one table based on the id of the other table.

Well, actually the operation will be close to O(n+m), where n and m are the number of items in the two tables. You would first loop through the records in one table to build a hash table from the key in that table, then you would loop through the other table to look up a match in the hash table for each of the records.

Looking up an item in a hash table is not an O(1) operation, but it's close. With more data you will have a few more hash collisions, so some of the lookups need to do more than one comparison.

回答5:

Major db vendors deprecated hash indexes long-long time ago. Therefore, joining 2 tables in O(max(n,m)) time is something that really doesn't matter in practice. With standard B-tree indexes join complexity is O(min(n,m)*log(max(n,m)).

来源：https://stackoverflow.com/questions/5557964/perform-joins-in-on-time

标签

algorithm

database-design

data-structures

database