Fastest way to check if a List contains a unique String

前端 未结 10 982
天涯浪人
天涯浪人 2020-12-13 11:43

Basically I have about 1,000,000 strings, for each request I have to check if a String belongs to the list or not.

I\'m worried about the performance, so what\'s the

相关标签:
10条回答
  • 2020-12-13 12:37

    In general, a HashSet will give you better performance, since it does not have to look through each element and compare, like an ArrayList does, but typically compares at most a few elements, where the hashcodes are equal.

    However, for 1M strings, the performance of hashSet may still not be optimal. A lot of cache misses will slow down searching the set. If all strings are equally likely, then this is unavoidable. However, if some strings are more often requested than others, then you can place the common strings into a small hashSet, and check that first, before checking the larger set. The small hashset should be sized to fit in cache (e.g. a few hundred K at most). Hits to the small hashset will then be very fast, while hits to the larger hashset proceed at speed limited by the memory bandwidth.

    0 讨论(0)
  • 2020-12-13 12:41

    I'd use a Set, in most cases HashSet is fine.

    0 讨论(0)
  • 2020-12-13 12:47

    Perhaps this isn't required for your case but I think it's useful to know that there is some space-efficient probabilistic algorithms. For example Bloom filter.

    0 讨论(0)
  • Your best bet is to use a HashSet and check if a string exists in the set via the contains() method. HashSets are built for fast access via the use of Object methods hashCode() and equals(). The Javadoc for HashSet states:

    This class offers constant time performance for the basic operations (add, remove, contains and size),

    HashSet stores objects in hash buckets which is to say that the value returned by the hashCode method will determine which bucket an object is stored in. This way, the amount of equality checks the HashSet has to perform via the equals() method is reduced to just the other Objects in the same hash bucket.

    To use HashSets and HashMaps effectively, you must conform to the equals and hashCode contract outlined in the javadoc. In the case of java.lang.String these methods have already been implemented to do this.

    0 讨论(0)
提交回复
热议问题