Hash Code and Checksum - what's the difference?

后端 未结 13 1729
天命终不由人
天命终不由人 2020-12-04 07:29

My understanding is that a hash code and checksum are similar things - a numeric value, computed for a block of data, that is relatively unique.

i.e. The pr

相关标签:
13条回答
  • 2020-12-04 07:55

    A checksum is simply a number generated from the data field by oring(by logical addition hence sum). The checksum has the capability to detect a corruption of any bit or number of bits within the data field from which it is generated ie it checks for errors that is all, it can not correct them. A checksum is a hash because the size of the checksum is smaller than the original data. Yes you will have collisions because the checksum is not at all sensitive to bit position in the data field.

    A cyclic redundancy check ( CRC) is something quite different , more complex and is NOT called a checksum. It is the application of a polynomial series which has the capability of correcting any chosen number of individual corrupted bits within the data field from which it was generated. The creation of a CRC results in a number greater in size than the original datafield (unlike the checksum) - hence the name including the word "redundancy" and the price you pay for the error correcting capability. A CRC is therefore NOT a hash and must not be confused or named as a checksum , because the redundancy necessarily adds to the size of the original data.

    0 讨论(0)
  • 2020-12-04 07:57
    • hash code(Sip Hash) usually is used for hash tables where access time near O(1)
    • check sum(MD5, SHA) is used to indicate data integrity

    The main difference is that check sum must me unique while hash code can be the same for different objects

    0 讨论(0)
  • 2020-12-04 08:00

    These days they are interchangable, but in days of yore a checksum was a very simple techique where you'd add all the data up (usually in bytes) and tack a byte on the end with that value in.. then you'd hopefully know if any of the original data had been corrupted. Similar to a check bit, but with bytes.

    0 讨论(0)
  • 2020-12-04 08:02

    Wikipedia puts it well:

    Checksum functions are related to hash functions, fingerprints, randomisation functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Some error-correcting codes are based on special checksums that not only detect common errors but also allow the original data to be recovered in certain cases.

    0 讨论(0)
  • 2020-12-04 08:02

    In Redis cluster data sharding, it uses a hash slot to decide which node it goes. Take for example the modulo operation below:

    123 % 9 = 6
    122 % 9 = 5
    141 % 9 = 6
    

    The 6 comes up twice across differing inputs. The purpose of the hash is simply to map an input value to an output value and uniqueness is not part of the deal. So two different inputs that produces the same output is fine in the world of hashes.

    A checksum, on the other hand, must differ the output even if one bit in the input changes because its purpose is not to map, but to detect data corruption. So two different inputs that produces the same output is not acceptable in a checksum.

    0 讨论(0)
  • 2020-12-04 08:05

    I would say that a checksum is necessarily a hashcode. However, not all hashcodes make good checksums.

    A checksum has a special purpose --- it verifies or checks the integrity of data (some can go beyond that by allowing for error-correction). "Good" checksums are easy to compute, and can detect many types of data corruptions (e.g. one, two, three erroneous bits).

    A hashcode simply describes a mathematical function that maps data to some value. When used as a means of indexing in data structures (e.g. a hash table), a low collision probability is desirable.

    0 讨论(0)
提交回复
热议问题