问题
is there a way to Join 2 tables in linear time? I heard this can be done by having another data structure (Hashtable), but I'm not sure how this can be done. I was always wondering a Join will involve a cross-product and hence it is O(n^2).
回答1:
Algorithm:
Loop through table A. Hash all Items, Add them to the Join array.
Loop through table B, check each item if it's in the hash table (Check - O(1)), if not, add to the Join table.
回答2:
If there are indexes available on columns used in the join, it's linear because the indexes allow an in-order traversal of both tables. (That's not counting the amortized index cost, of course.)
A hash join will be sort-of linear, though the hashing itself isn't free, and when the keys involved are long then the costs also go up.
回答3:
It depends on the type of join. A cross join is always going to be O(n^2) since it has to produce O(n^2) records. An equi-join can be done with better complexity (O(n log(n)) or perhaps even amortized O(n)), provided right data structures are employed.
回答4:
You can join two tables in close to O(n) by using a hash table to look up records in one table based on the id of the other table.
Well, actually the operation will be close to O(n+m), where n and m are the number of items in the two tables. You would first loop through the records in one table to build a hash table from the key in that table, then you would loop through the other table to look up a match in the hash table for each of the records.
Looking up an item in a hash table is not an O(1) operation, but it's close. With more data you will have a few more hash collisions, so some of the lookups need to do more than one comparison.
回答5:
Major db vendors deprecated hash indexes long-long time ago. Therefore, joining 2 tables in O(max(n,m)) time is something that really doesn't matter in practice. With standard B-tree indexes join complexity is O(min(n,m)*log(max(n,m)).
来源:https://stackoverflow.com/questions/5557964/perform-joins-in-on-time