Image comparison algorithm

前端 未结 9 1612
礼貌的吻别
礼貌的吻别 2020-11-28 00:48

I\'m trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite go

9条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-28 01:27

    I propose a solution based on the Jaccard index of similarity on the image histograms. See: https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard_similarity_and_distance

    You can compute the difference in the distribution of the pixel colors. This is indeed pretty invariant to translations.

    from PIL.Image import Image
    from typing import List
    
    def jaccard_similarity(im1: Image, im2: Image) -> float:
        """Compute the similarity between two images.
        First, for each image an histogram of the pixels distribution is extracted.
        Then, the similarity between the histograms is compared using the weighted Jaccard index of similarity, defined as:
        Jsimilarity = sum(min(b1_i, b2_i)) / sum(max(b1_i, b2_i)
        where b1_i, and b2_i are the ith histogram bin of images 1 and 2, respectively.
    
        The two images must have same resolution and number of channels (depth).
    
        See: https://en.wikipedia.org/wiki/Jaccard_index
        Where it is also called Ruzicka similarity."""
    
        if im1.size != im2.size:
            raise Exception("Images must have the same size. Found {} and {}".format(im1.size, im2.size))
    
        n_channels_1 = len(im1.getbands())
        n_channels_2 = len(im2.getbands())
        if n_channels_1 != n_channels_2:
            raise Exception("Images must have the same number of channels. Found {} and {}".format(n_channels_1, n_channels_2))
    
        assert n_channels_1 == n_channels_2
    
        sum_mins = 0
        sum_maxs = 0
    
        hi1 = im1.histogram()  # type: List[int]
        hi2 = im2.histogram()  # type: List[int]
    
        # Since the two images have the same amount of channels, they must have the same amount of bins in the histogram.
        assert len(hi1) == len(hi2)
    
        for b1, b2 in zip(hi1, hi2):
            min_b = min(b1, b2)
            sum_mins += min_b
            max_b = max(b1, b2)
            sum_maxs += max_b
    
        jaccard_index = sum_mins / sum_maxs
    
        return jaccard_index
    

    With respect to mean squared error, the Jaccard index lies always in the range [0,1], thus allowing for comparisons among different image sizes.

    Then, you can compare the two images, but after rescaling to the same size! Or pixel counts will have to be somehow normalized. I used this:

    import sys
    
    from skincare.common.utils import jaccard_similarity
    
    import PIL.Image
    from PIL.Image import Image
    
    file1 = sys.argv[1]
    file2 = sys.argv[2]
    
    im1 = PIL.Image.open(file1)  # type: Image
    im2 = PIL.Image.open(file2)  # type: Image
    
    print("Image 1: mode={}, size={}".format(im1.mode, im1.size))
    print("Image 2: mode={}, size={}".format(im2.mode, im2.size))
    
    if im1.size != im2.size:
        print("Resizing image 2 to {}".format(im1.size))
        im2 = im2.resize(im1.size, resample=PIL.Image.BILINEAR)
    
    j = jaccard_similarity(im1, im2)
    print("Jaccard similarity index = {}".format(j))
    

    Testing on your images:

    $ python CompareTwoImages.py im1.jpg im2.jpg
    Image 1: mode=RGB, size=(401, 105)
    Image 2: mode=RGB, size=(373, 109)
    Resizing image 2 to (401, 105)
    Jaccard similarity index = 0.7238955686269157
    $ python CompareTwoImages.py im1.jpg im3.jpg 
    Image 1: mode=RGB, size=(401, 105)
    Image 2: mode=RGB, size=(457, 121)
    Resizing image 2 to (401, 105)
    Jaccard similarity index = 0.22785529941822316
    $ python CompareTwoImages.py im2.jpg im3.jpg 
    Image 1: mode=RGB, size=(373, 109)
    Image 2: mode=RGB, size=(457, 121)
    Resizing image 2 to (373, 109)
    Jaccard similarity index = 0.29066426814105445
    

    You might also consider experimenting with different resampling filters (like NEAREST or LANCZOS), as they, of course, alter the color distribution when resizing.

    Additionally, consider that swapping images change the results, as the second image might be downsampled instead of upsampled (After all, cropping might better suit your case rather than rescaling.)

提交回复
热议问题