If I need to retrieve a large string from a DB, Is it faster to search for it using the string itself or would I gain by hashing the string and storing the hash in the DB as
I am confused and am probably misunderstanding your question.
If you already have the string (thus you can compute the hash), why do you need to retrieve it?
Do you use a large string as the key for something perhaps?
TIP: if you are going to store the hash in the database, a MD5 Hash is always 16 bytes, so can be saved in a uniqueidentifier column (and System.Guid in .NET)
This might offer some performance gain over saving hashes in a different way (I use this method to check for binary/ntext field changes but not for strings/nvarchars).
In general: probably not, assuming the column is indexed. Database servers are designed to do such lookups quickly and efficiently. Some databases (e.g. Oracle) provide options to build indexes based on hashing.
However, in the end this can be only answered by performance testing with representative (of your requirements) data and usage patterns.
I'd be surprised if this offered huge improvement and I would recommend not using your own performance optimisations for a DB search.
If you use a database index there is scope for performance to be tuned by a DBA using tried and trusted methods. Hard coding your own index optimisation will prevent this and may stop you gaining for any performance improvements in indexing in future versions of the DB.
If your strings are short (less than 100 charaters in general), strings will be faster.
If the strings are large, HASH
search may and most probably will be faster.
HashBytes(MD4)
seems to be the fastest on DML
.
Though I've never done it, it sounds like this would work in principle. There's a chance you may get false positives but that's probably quite slim.
I'd go with a fast algorithm such as MD5 as you don't want to spend longer hashing the string than it would have taken you to just search for it.
The final thing I can say is that you'll only know if it is better if you try it out and measure the performance.