Complexity in using Binary search and Trie

问题

given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs.
priorties: 1. speed. 2. memory

I already know I can use (n is number of words, m is average length of the words) 1. a trie, time is O(log(n)), space(best case) is O(log(nm)), space(worst case) is O(nm).
2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)

I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?

回答1:

It is O(m) time for the trie, and up to O(mlog(n)) for the binary search. The space is asymptotically O(nm) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.

There are other options for implementing a set structure - hashset and treeset are easy choices in most languages. I'd go for the hash set as it is efficient and simple.

回答2:

I think HashMap is perfectly fine for your case, since the time complexity for both put and get operations is O(1). It works perfectly fine even if you dont have a sorted list.!!!

回答3:

Preprocessing is ok since I will be calling > this function many times over different inputs.

As a food for thought, do you consider creating a set from the input data and then searching using particular hash? It will take more time process for the first time to build a set but if number of inputs is limited and you may return to them then set might be good idea with O(1) for "contains" operation for a good hash function.

回答4:

I'd recommend a hashmap. You can find an extension to C++ for this in both VC and GCC.

回答5:

Use a bloom filter. It is space efficient even for very large data and it is a fast rejection technique.

来源：https://stackoverflow.com/questions/2718816/complexity-in-using-binary-search-and-trie

标签

algorithm

data-structures

complexity-theory