OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?

前端 未结 4 616
遇见更好的自我
遇见更好的自我 2020-11-29 16:28

There are some topics here that are very helpful on how to find similar pictures.

What I want to do is to get a fingerprint of a picture and find the same picture on

相关标签:
4条回答
  • 2020-11-29 17:01

    Min-Hash or min-Hashing is a technique that might help you. It encodes the whole image in a representation with adjustable size that is then stored in hash tables. Several variants like Geometric min-Hashing, Partition min-Hash and Bundle min-Hashing do exist. The resulting memory footprint is not one of the smallest but these techniques works for a variety of scenarios such as near-duplicate retrieval and even small object retrieval - a scenario where other short signatures often do not perform very well.

    There are several papers on this topic. Entry literature would be: Near Duplicate Image Detection: min-Hash and tf-idf Weighting Ondrej Chum, James Philbin, Andrew Zisserman, BMVC 2008 PDF

    0 讨论(0)
  • 2020-11-29 17:08

    It seems like GIST might be a more appropriate thing to use.

    http://people.csail.mit.edu/torralba/code/spatialenvelope/ has MATLAB code.

    0 讨论(0)
  • 2020-11-29 17:18

    The feature data you mention (position, laplacian, size, orientation, hessian) is insufficient for your purpose (these are actually the less relevant parts of the descriptor if you want to do matching). The data you want to look at are the "descriptors" (the 4th argument):

    void cvExtractSURF(const CvArr* image, const CvArr* mask, CvSeq** keypoints, CvSeq** descriptors, CvMemStorage* storage, CvSURFParams params)

    These are 128 or 64 (depending on params) vectors which contain the "fingerprints" of the specific feature (each image will contain a variable amount of such vectors). If you get the latest version of Opencv they have a sample named find_obj.cpp which shows you how it is used for matching

    update:

    you might find this discussion helpful

    0 讨论(0)
  • 2020-11-29 17:24

    A trivial way to compute a hash would be the following. Get all the descriptors from the image (say, N of them). Each descriptor is a vector of 128 numbers (you can convert them to be integers between 0 and 255). So you have a set of N*128 integers. Just write them one after another into a string and use that as a hash value. If you want the hash values to be small, I believe there are ways to compute hash functions of strings, so convert descriptors to string and then use the hash value of that string.

    That might work if you want to find exact duplicates. But it seems (since you talk about scale, rotation, etc) you want to just find "similar" images. In that case, using a hash is probably not a good way to go. You probably use some interest point detector to find points at which to compute SURF descriptors. Imagine that it will return the same set of points, but in different order. Suddenly your hash value will be very different, even if the images and descriptors are the same.

    So, if I had to find similar images reliably, I'd use a different approach. For example, I could vector-quantize the SURF descriptors, build histograms of vector-quantized values, and use histogram intersection for matching. Do you really absolutely have to use hash functions (maybe for efficiency), or do you just want to use whatever to find similar images?

    0 讨论(0)
提交回复
热议问题