Fast and simple image hashing algorithm

穿精又带淫゛_ 提交于 2019-12-18 10:55:28

问题


I need a (preferably simple and fast) image hashing algorithm. The hash value is used in a lookup table, not for cryptography.

Some of the images are "computer graphic" - i.e. solid-color filled rects, rasterized texts and etc., whereas there are also "photographic" images - containing rich color spectrum, mostly smooth, with reasonable noise amplitude.

I'd also like the hashing algorithm to be able to be applied to specific image parts. I mean, the image can be divided into a grid cells, and the hash function of each cell should depend only on the contents of this cell. So that one may spot quickly if two images have common areas (in case they're aligned appropriately).

Note: I only need to know if two images (or their parts) are identical. That is, I don't need to match similar images, there's no need in feature recognition, correlation, and other DSP techniques.

I wonder what is the preferred hashing algorithm.

For "photographic" images just XOR-ing all the pixels within a grid cell is ok more-or-less. The probability of the same hash value for different images is pretty low, especially because the presence of the (nearly white) noise breaks all the potential symmetries. Plus the spectrum of such a hash function looks good (any value is possible with nearly the same probability).

But such a naive algorithm may not be used with "artificial" graphics. Identical pixels, repeating patterns, geometrical offset invariance are very common for such images. XOR-ing all the pixels will give 0 for any image with even number of identical pixels.

Using something like CRT-32 looks somewhat promising, but I'd like to figure-out something faster. I thought about iterative formula, each new pixel mutates the current hash value, like this:

hashValue = (hashValue * /*something*/ | newPixelValue) % /* huge prime */

Doing modulo prime number should probably give a good dispersion, so that I'm leaning toward this option. But I'd like to know if there are better varians.

Thanks in advance.


回答1:


If you want to make it very fast, you should consider taking a random subset of the pixels to avoid reading the entire image. Next, compute a hash function on the sequence of values at those pixels. The random subset should be selected by a deterministic pseudo-random number generator with fixed seed so that identical images produce identical subsets and consequently identical hash values.

This should work reasonably well even for artificial images. However, if you have images which differ from each other by a small number of pixels, this is going to give hash collisions. More iterations give better reliability. If that is the case, for instance, if your images set is likely to have pairs with one different pixel, you must read every pixel to compute the hash value. Taking a simple linear combination with pseudo-random coefficients would be good enough even for artificial images.

pseudo-code of a simple algorithm

Random generator = new generator(2847)  // Initialized with fixed seed
int num_iterations = 100

int hash(Image image) {
    generator.reset()   //To ensure consistency on each evaluation
    int value = 0
    for num_iteration steps {
        int nextValue = image.getPixel(generator.nextInt()%image.getSize()).getValue()
        value = value + nextValue*generator.nextInt()
    }
    return value
}



回答2:


Have a look at this tutorial on the phash algorithm http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html which is used to find closely matching images.



来源:https://stackoverflow.com/questions/11336209/fast-and-simple-image-hashing-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!