Fastest hash for non-cryptographic uses?

后端 未结 13 1122
挽巷
挽巷 2020-12-04 08:23

I\'m essentially preparing phrases to be put into the database, they may be malformed so I want to store a short hash of them instead (I will be simply comparing if they exi

13条回答
  •  生来不讨喜
    2020-12-04 08:40

    2019 update: This answer is the most up to date. Libraries to support murmur are largely available for all languages.

    The current recommendation is to use the Murmur Hash Family (see specifically the murmur2 or murmur3 variants).

    Murmur hashes were designed for fast hashing with minimal collisions (much faster than CRC, MDx and SHAx). It's perfect to look for duplicates and very appropriate for HashTable indexes.

    In fact it's used by many of the modern databases (Redis, ElastisSearch, Cassandra) to compute all sort of hashes for various purposes. This specific algorithm was the root source of many performance improvements in the current decade.

    It's also used in implementations of Bloom Filters. You should be aware that if you're searching for "fast hashes", you're probably facing a typical problem that is solved by Bloom filters. ;-)

    Note: murmur is a general purpose hash, meaning NON cryptographic. It doesn't prevent to find the source "text" that generated a hash. It's NOT appropriate to hash passwords.

    Some more details: MurmurHash - what is it?

提交回复
热议问题