Complexity in using Binary search and Trie

匆匆过客 提交于 2019-12-19 04:45:23

问题


given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs.
priorties: 1. speed. 2. memory

I already know I can use (n is number of words, m is average length of the words) 1. a trie, time is O(log(n)), space(best case) is O(log(nm)), space(worst case) is O(nm).
2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)

I'm not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?


回答1:


It is O(m) time for the trie, and up to O(mlog(n)) for the binary search. The space is asymptotically O(nm) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.

There are other options for implementing a set structure - hashset and treeset are easy choices in most languages. I'd go for the hash set as it is efficient and simple.




回答2:


I think HashMap is perfectly fine for your case, since the time complexity for both put and get operations is O(1). It works perfectly fine even if you dont have a sorted list.!!!




回答3:


Preprocessing is ok since I will be calling > this function many times over different inputs.

As a food for thought, do you consider creating a set from the input data and then searching using particular hash? It will take more time process for the first time to build a set but if number of inputs is limited and you may return to them then set might be good idea with O(1) for "contains" operation for a good hash function.




回答4:


I'd recommend a hashmap. You can find an extension to C++ for this in both VC and GCC.




回答5:


Use a bloom filter. It is space efficient even for very large data and it is a fast rejection technique.



来源:https://stackoverflow.com/questions/2718816/complexity-in-using-binary-search-and-trie

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!