Basically I have about 1,000,000 strings, for each request I have to check if a String belongs to the list or not.
I\'m worried about the performance, so what\'s the
Before going further, please consider this: Why are you worried about performance? How often is this check called?
As for possible solutions:
If the list is already sorted, then you can use java.util.Collections.binarySearch
which offers the same performance characteristics as a java.util.TreeSet
.
Otherwise you can use a java.util.HashSet
that as a performance characteristic of O(1). Note that calculating the hash code for a string that doesn't have one calculated yet is an O(m) operation with m=string.length()
. Also keep in mind that hashtables only work well until they reach a given load factor, i.e. hashtables will use more memory than plain lists. The default load factor used by HashSet is .75, meaning that internally a HashSet for 1e6 objects will use an array with 1.3e6 entries.
If the HashSet does not work for you (e.g. because there are lots of hash-collisions, because memory is tight or because there are lots of insertions), than consider using a Trie. Lookup in a Trie has a worst-case complexity of O(m) where m=string.length()
. A Trie has also some extra-benefits that might be useful for you: e.g., it can give you the closest fit for a search string. But keep in mind that the best code is no code, so only roll your own Trie implementiation if the benefits outweight the costs.
Consider using a database if you want more complex queries, e.g. match for a substring or a regular expression.