Extract black objects from color background

It is easy for human eyes to tell black from other colors. But how about computers?

I printed some color blocks on the normal A4 paper. Since there are three kinds of ink to compose a color image, cyan, magenta and yellow, I set the color of each block C=20%, C=30%, C=40%, C=50% and rest of two colors are 0. That is the first column of my source image. So far, no black (K of CMYK) ink is supposed to print. After that, I set the color of each dot K=100% and rest colors are 0 to print black dots.

You may feel my image is weird and awful. In fact, the image is magnified 30 times and how the ink cheat our eyes can be seen clearly. The color strips hamper me to recognize these black dots (the dot is printed as just one pixel in 800 dpi). Without the color background, I used to blur and do canny edge detector to extract the edge. However, when adding color background, simply do grayscale and edge detector cannot get good results because of the strips. How will my eyes do in order to solve such problems?

I determined to check the brightness of source image. I referred this article and formula:

brightness = sqrt( 0.299 R * R + 0.587 G * G + 0.114 B * B )

The brightness is more close to human perception and it works very well in the yellow background because the brightness of yellow is the highest compared with cyan and magenta. But how to make cyan and magenta strips as bright as possible? The expected result is that all the strips disappear.

More complicated image:

C=40%, M=40%

C=40%, Y=40%

Y=40%, M=40%

FFT result of C=40%, Y=40% brightness image

Anyone can give me some hints to remove the color strips?

@natan I tried FFT method you suggested me, but I was not lucky to get peak at both axis x and y. In order to plot the frequency as you did, I resized my image to square.

bla

After inspecting the images, I decided that a robust threshold will be more simple than anything. For example, looking at the C=40%, M=40% photo, I first inverted the intensities so black (the signal) will be white just using

im=(abs(255-im));

we can inspect its RGB histograms using this :

hist(reshape(single(im),[],3),min(single(im(:))):max(single(im(:)))); 
colormap([1 0 0; 0 1 0; 0 0 1]);

so we see that there is a large contribution to some middle intensity whereas the "signal" which is now white, is mostly separated to higher value. I then applied a simple thresholds as follows:

thr = @(d) (max([min(max(d,[],1))  min(max(d,[],2))])) ;
for n=1:size(im,3)
    imt(:,:,n)=im(:,:,n).*uint8(im(:,:,n)>1.1*thr(im(:,:,n)));
end

imt=rgb2gray(imt);

and got rid of objects smaller than some typical area size

min_dot_area=20;
bw=bwareaopen(imt>0,min_dot_area);
imagesc(bw); 
colormap(flipud(bone));

here's the result together with the original image:

The origin of this threshold is from this code I wrote that assumed sparse signals in the form of 2-D peaks or blobs in a noisy background. By sparse I meant that there's no pile up of peaks. In that case, when projecting max(image) on the x or y axis (by (max(im,[],1) or (max(im,[],1) you get a good measure of the background. That is because you take the minimal intensity of the max(im) vector.

If you want to look at this differently you can look at the histogram of the intensities of the image. The background is supposed to be a normal distribution of some kind around some intensity, the signal should be higher than that intensity, but with much lower # of occurrences. By finding max(im) of one of the axes (x or y) you discover what was the maximal noise level.

You'll see that the threshold picks that point in the histogram where there are still some noise above it, but ALL the signal is above it too. that's why I adjusted it to be 1.1*thr. Last, there are many fancier ways to obtain a robust threshold, this is a quick and dirty way that in my view is good enough...

I would convert the image to the HSV colour space and then use the Value channel. This basically separates colour and brightness information.

This is the 50% cyan image

Then you can just do a simple threshold to isolate the dots.

I just did this very quickly and im sure you could get better results. Maybe find contours in the image and then remove any contours with a small area, to filter any remaining noise.

WangYudong

Thanks to everyone for posting his answer! After some search and attempt, I also come up with an adaptive method to extract these black dots from the color background. It seems that considering only the brightness could not solve the problem perfectly. Therefore natan's method which calculates and analyzes the RGB histogram is more robust. Unfortunately, I still cannot obtain a robust threshold to extract the black dots in other color samples, because things are getting more and more unpredictable when we add deeper color (e.g. Cyan > 60) or mix two colors together (e.g. Cyan = 50, Magenta = 50).

One day, I google "extract color" and TinEye's color extraction and color thief inspire me. Both of them are very cool application and the image processed by the former website is exactly what I want. So I determine to implement a similar stuff on my own. The algorithm I used here is k-means clustering. And some other related key words to search may be color palette, color quantation and getting dominant color.

I firstly apply Gaussian filter to smooth the image.

GaussianBlur(img, img, Size(5, 5), 0, 0);

OpenCV has kmeans function and it saves me a lot of time on coding. I modify this code.

// Input data should be float32 
Mat samples(img.rows * img.cols, 3, CV_32F);
for (int i = 0; i < img.rows; i++) {
    for (int j = 0; j < img.cols; j++) {
        for (int z = 0; z < 3; z++) {
            samples.at<float>(i + j * img.rows, z) = img.at<Vec3b>(i, j)[z];
        }
    }
}

// Select the number of clusters
int clusterCount = 4;
Mat labels;
int attempts = 1;
Mat centers;
kmeans(samples, clusterCount, labels, TermCriteria(CV_TERMCRIT_ITER|CV_TERMCRIT_EPS, 10, 0.1), attempts, KMEANS_PP_CENTERS, centers);

// Draw clustered result
Mat cluster(img.size(), img.type());
for (int i = 0; i < img.rows; i++) {
     for(int j = 0; j < img.cols; j++) { 
        int cluster_idx = labels.at<int>(i + j * img.rows, 0);
        cluster.at<Vec3b>(i, j)[0] = centers.at<float>(cluster_idx, 0);
        cluster.at<Vec3b>(i, j)[1] = centers.at<float>(cluster_idx, 1);
        cluster.at<Vec3b>(i, j)[2] = centers.at<float>(cluster_idx, 2);
    }
}
imshow("clustered image", cluster); 
// Check centers' RGB value
cout << centers;

After clustering, I convert the result to grayscale and find the darkest color which is more likely to be the color of the black dots.

// Find the minimum value
cvtColor(cluster, cluster, CV_RGB2GRAY);
Mat dot = Mat::zeros(img.size(), CV_8UC1);
cluster.copyTo(dot);
int minVal = (int)dot.at<uchar>(dot.cols / 2, dot.rows / 2);
for (int i = 0; i < dot.rows; i += 3) {
    for (int j = 0; j < dot.cols; j += 3) {
        if ((int)dot.at<uchar>(i, j) < minVal) {
            minVal = (int)dot.at<uchar>(i, j);
        }
    }
}
inRange(dot, minVal - 5 , minVal + 5, dot);
imshow("dot", dot);

Let's test two images.

(clusterCount = 4)

(clusterCount = 5)

One shortcoming of the k-means clustering is one fixed clusterCount cannot be applied to every image. Also clustering is not so fast for larger images. That's the issue annoys me a lot. My dirty method for better real time performance (on iPhone) is to crop 1/16 of the image and cluster the smaller area. Then compare all the pixels in the original image with each cluster center, and pick the pixel that are the nearest to the "black" color. I simply calculate euclidean distance between two RGB colors.

A simple method is to just threshold all the pixels. Here is this idea expressed in pseudo code.

for each pixel in image
    if brightness < THRESHOLD
        pixel = BLACK
    else
        pixel = WHITE

Or if you're always dealing with cyan, magenta and yellow backgrounds then maybe you might get better results with the criteria

if pixel.r < THRESHOLD and pixel.g < THRESHOLD and pixel.b < THRESHOLD

This method will only give good results for easy images where nothing except the black dots is too dark.

You can experiment with the value of THRESHOLD to find a good value for your images.

I suggest to convert to some chroma-based color space, like LCH, and adjust simultaneous thresholds on lightness and chroma. Here is the result mask for L < 50 & C < 25 for the input image: