How to categorize True Negatives in sliding window object detection?

问题

I'm gathering results from my image detector algorithm. So basically what I do is that, from a set of images (with the size of 320 x 480), I would run a sliding window of 64x128 thru it, and also under a number of predefined scales.

I understand that:

True Positives = when my detected window overlaps (within defined intersection size / centroid) with the ground-truth (annotated bounding boxes)
False Positives = when the algorithm gives me positive windows, which are outside of the grond truth.
False Negatives = when it failed me to give positive window, while the ground truth annotation states that there's an object.

But what about True Negatives ? Are these true negatives all the windows that my classifier gives me negative results ? That sounds weird, since I'm sliding a small window (64x128) by 4 pixels at a time, and I've around 8 different scales used in detection. If I were to do that, then I'd have lots of true negatives per image.

Or do I prepare a set of pure negative images (no objects / human at all), where I just slide thru, and if there's one or more positive detections in each of these images, I'd count it as False Negative, and vice versa ?

Here's an example image (with green rects as the ground truth)

回答1:

I've always seen the four terms as the following:

False negative; Result should have been positive, but is negative.
False positive; Result should have been negative, but is positive.
True positive; Result should have been positive and is positive.
True negative; Result should have been negative and is negative.

In your case, if I understand correctly, you are trying to detect if there are objects in your image. False negative would therefore mean that there was a object (result should be positive) but the algorithm did not detect it (and therefore returned negative). A true negative is simply the algorithm correctly stating that the area it checked does not hold an object.

You can choose to ignore negative values, but these could be used to further train your algorithm (Eg; using an algorithm that looks for both, instead of setting everything that is not recognised to false).

回答2:

AFAIK, A True Negative would be a scenario where an object is present in the image but has not been marked either in the ground truth annotation or the model prediction.

Usually 2D object detection systems use only two data i.e ground truth annotations and model predictions. However, to find the True Negative cases we would require a sought of superset of the ground truth annotations which contains information about all the class instances present in the image (not just those specific to our model).

For example taking the given image; if we are interested in doing object detection for autonomous driving purposes we can consider the two ground truth annotations as below:

Super Set GT Annotations

car(vehicles)
person
tree
animal
house_window
burger(maybe thrown on the road)

Autonomous Driving GT Annotations

car(vehicles)
person
tree
animal

With the above two ground truth annotations it would be possible to calculate the True Negatives for burger and window. However I doubt if True Negatives can be calculated without the superset annotation.

回答3:

There is a nice explanation here. F1 score explained in wiki and here is helpful for measuring success.

I have an attempt to write a function that calculates F1 score:

    /// <param name="realClasses">Class names that exists on the image. A class name may exists more than once.</param>
    /// <param name="foundClasses">Predicted class names. A class name may exists more than once.</param>
    private static void findPosNeg(List<string> realClasses, List<string> foundClasses, out int truePositive, out int falsePositive, out int falseNegative)
    {            
        Dictionary<string, int> dicReal = new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
        Dictionary<string, int> dicFound = new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
        #region fill dictionaries
        foreach (string className in realClasses)
        {
            if (!dicReal.ContainsKey(className))
                dicReal[className] = 1;
            else
                dicReal[className]++;
        }
        foreach (string className in foundClasses)
        {
            if (!dicFound.ContainsKey(className))
                dicFound[className] = 1;
            else
                dicFound[className]++;
        }
        #endregion

        truePositive = 0;
        falsePositive = 0;
        falseNegative = 0;
        foreach (string className in dicFound.Keys)
        {
            if (!dicReal.ContainsKey(className))
                falsePositive += dicFound[className];
            else
            {
                int found = dicFound[className];
                int real = dicReal[className];
                truePositive += Math.Min(found, real);
                if (real > found)
                    falseNegative += real - found;
                else if (found > real)
                    falsePositive += found - real;
            }
        }
        foreach (string className in dicReal.Keys)
            if (!dicFound.ContainsKey(className))
                falseNegative += dicReal[className];

    }
    /// <summary>
    /// Calculates F1Score ref:https://en.wikipedia.org/wiki/Precision_and_recall
    /// </summary>
    private static double calc_F1Score(int truePositive, int falsePositive, int falseNegative, out double precision, out double recall)
    {
        precision = (double)truePositive / ((double)truePositive + (double)falsePositive);
        recall = (double)truePositive / ((double)truePositive + (double)falseNegative);
        double div = (precision + recall);
        return (div != 0d) ? 2d * precision * recall / div : 0d;
    }

来源：https://stackoverflow.com/questions/16271603/how-to-categorize-true-negatives-in-sliding-window-object-detection

标签

OpenCV

machine-learning

computer-vision

object-detection