I\'m trying to choose a hash algorithm for comparing about max 20 different text data.
Which hash is better for these requirements?
A very quick check would be to take the length of a text and XOR it with the first 4 bytes of it and use that as a hash. If this is good enough it is extremely fast because independent of the number of bytes of the file.
How long does the hash need to hold for? GetHashCode()
is pretty accessible, gives a small response (4 bytes), which should be fine (re minimizing collisions) over 20 strings.
However, GetHashCode()
should not be persisted to database - it is fine for in-memory comparisons, though. Just be aware that the algorithm may change between frameworks (and did between 1.1 and 2.0).
The other advantage of this is that it is trivial to use - just use a Dictionary<string,Something>
, which will deal with all the hashing etc for you.