If you don't care about an occasional false positive, you can use a Bloom Filter to detect probable duplicates in the collection. If false positives can't be accepted, take the values that fail the filter and run a second detection pass on those. The list of failed values should be fairly small, although they will need to be checked against the full input.