Good choice for a lightweight checksum algorithm?

好久不见. 提交于 2019-11-30 06:51:45

CRC32 is not too hard to implement in any language, it is good enough to detect simple data corruption and when implemted in a good fashion, it is very fast. However you can also try Adler32, which is almost equally good as CRC32, but it's even easier to implement (and about equally fast).

Adler32 in the Wikipedia

CRC32 JavaScript implementation sample

Either of these two (or maybe even both) are available in Java right out of the box.

Are aware that both TCP and UDP (and IP, and Ethernet, and...) already provide checksum protection to data in transit?

Unless you're doing something really weird, if you're seeing corruption, something is very wrong. I suggest starting with a memory tester.

Also, you receive strong data integrity protection if you use SSL/TLS.

Other people have mentioned CRC32 already, but here's a link to the W3C implementation of CRC-32 for PNG, as one of the few well-known, reputable sites with a reference CRC implementation.

(A few years back I tried to find a well-known site with a CRC algorithm or at least one that cited the source for its algorithm, & was almost tearing my hair out until I found the PNG page.)

[UPDATE 30/5/2013: The link to the old JS CRC32 implementation died, so I've now linked to a different one.]

Google CRC32: fast, and much lighter weight than MD5 et al. There is a Javascript implementation here.

In my search for a JavaScript implementation of a good checksum algorithm I came across this question. Andrzej Doyle rightfully chose Adler32 as the checksum, as it is indeed easy to implement and has some excellent properties. DroidOS then provided an actual implementation in JavaScript, which demonstrated the simplicity.

However, the algorithm can be further improved upon as detailed in the Wikipedia page and as implemented below. The trick is that you need not determine the modulo in each step. Rather, you can defer this to the end. This considerably increases the speed of the implementation, up to 6x faster on Chrome and Safari. In addition, this optimalisation does not affect the readability of the code making it a win-win. As such, it definitely fits in well with the original question as to having an algorithm / implementation that is computationally light.

function adler32(data) {
  var MOD_ADLER = 65521;
  var a = 1, b = 0;

  var len = data.length;

  for (var i = 0; i < len; i++) {
    a += data.charCodeAt(i);
    b += a;
  }

  a %= MOD_ADLER;
  b %= MOD_ADLER;

  return (b << 16) | a;
}

edit: imaya created a jsperf comparison a while back showing the difference in speed when running the simple version, as detailed by DroidOS, compared to an optimised version that defers the modulo operation. I have added the above implementation under the name full-length to the jsperf page showing that the above implementation is about 25% faster than the one from imaya and about 570% faster than the simple implementation (tests run on Chrome 30): http://jsperf.com/adler-32-simple-vs-optimized/6

edit2: please don't forget that, when working on large files, you will eventually hit the limit of your JavaScript implementation in terms of the a and b variables. As such, when working with a large data source, you should perform intermediate modulo operations as to ensure that you do not exceed the maximum value of the integer that you can reliably store.

Use SHA-1 JS implementation. It's not as slow as you think (Firefox 3.0 on Core 2 Duo 2.4Ghz hashes over 100KB per second).

Here's a relatively simple one I've 'invented' - there's no mathematical research behind it but it's extremely fast and works in practice. I've also included the Java equivalent that tests the algorithm and shows that there's less than 1 in 10,000,000 chance of failure (it takes a minute or two to run).

JavaScript

function getCrc(s) {
    var result = 0;
    for(var i = 0; i < s.length; i++) {
        var c = s.charCodeAt(i);
        result = (result << 1) ^ c;
    }
    return result;
}

Java

package test;

import java.util.*;

public class SimpleCrc {

    public static void main(String[] args) {
        final Random randomGenerator = new Random();
        int lastCrc = -1;
        int dupes = 0;
        for(int i = 0; i < 10000000; i++) {
            final StringBuilder sb = new StringBuilder();
            for(int j = 0; j < 1000; j++) {
                final char c = (char)(randomGenerator.nextInt(128 - 32) + 32);
                sb.append(c);
            }
            final int crc = crc(sb.toString());
            if(lastCrc == crc) {
                dupes++;
            }
            lastCrc = crc;
        }
        System.out.println("Dupes: " + dupes);
    }

    public static int crc(String string) {
        int result = 0;
        for(final char c : string.toCharArray()) {
            result = (result << 1) ^ c;
        }
        return result;
    }
}

This is a rather old thread but I suspect it is still viewed quite often so - if all you need is a short but reliable piece of code to generate a checksum the Adler32 bit algorithm has to be your choice. Here is the JavaScript code

function adler32(data)
{
 var MOD_ADLER = 65521;
 var a = 1, b = 0;

 for (var i = 0;i < data.length;i++) 
 {
  a = (a + data.charCodeAt(i)) % MOD_ADLER;
  b = (b + a) % MOD_ADLER;
 }

 var adler = a | (b << 16);
 return adler;
}

The corresponding fiddle demonsrating the algorithm in action is here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!