I need some help on LBP based face detection and that is why I am writing this.
I have the following questions related to face detection implemented on OpenCV:
I refer you to my own answer from the past which lightly touches on the topic, but didn't explain the XML cascade format.
Let's look at a fake, modified for clarity example of a cascade with only a single stage, and three features.
<_>
3
-0.75
<_>
0 -1 3 -67130709 -21569 -1426120013 -1275125205 -21585
-16385 587145899 -24005
-0.65 0.88
<_>
0 -1 0 -163512766 -769593758 -10027009 -262145 -514457854
-193593353 -524289 -1
-0.77 0.72
<_>
0 -1 2 -363936790 -893203669 -1337948010 -136907894
1088782736 -134217726 -741544961 -1590337
-0.71 0.68
Somewhat later....
<_>
0 0 3 5
<_>
0 0 4 2
<_>
0 0 6 3
<_>
0 1 4 3
<_>
0 1 3 3
...
Let us look first at the tags of a stage:
maxWeakCount
for a stage is the number of weak classifiers in the stage, what is called in the comments a
and what I call an LBP feature.
3
.stageThreshold
is what the weights of the features must add up to at least for the stage to pass.
-0.75
.Turning to the tags describing an LBP feature:
internalNodes
are an array of 11 integers. The first two are meaningless for LBP cascades. The third is the index into the
table of
s at the end of the XML file (A
describes the geometry of the feature). The last 8 values are eight 32-bit values which together constitute the 256-bit LUT I spoke of in my earlier answer. This LUT is computed by the training process, which I don't fully understand myself.
3
, which is described by the four integers 0 1 4 3
.leafValues
are the two weights (pass/fail) associated with a feature. Depending on the bit selected from the internalNodes
during feature evaluation, one of those two weights is added to a total. This total is compared to the stage's
. Then, bool stagePassed = (sum >= stageThreshold - EPS);
, where EPS
is 1e-5, determines whether the stage has passed or failed. The weights are also determined by the training process.
-0.65
and the pass weight is 0.88
.Lastly, the
tag. It consists of an array of
tags which contain 4 integers describing the geometry of the feature. Given a processing window (24x24 in your case), the first two integers describe its x
and y
integer pixel offset within the processing window, and the next two integers describe the width and height of one subrectangle out of the 9 that are needed for the LBP feature to be evaluated.
In essence then, a tag
situated within a processing window pW.width
xpW.height
checking whether a face is present at pW.x
xpW.y
corresponds to...
To evaluate the LBP then, it suffices to read the integral image at points p[0..15]
and use p[BR]+p[TL]-p[TR]-p[BL]
to compute the integral of the nine subrectangles. The central subrectangle, R4, is compared that of the eight others, clockwise starting from R0, to produce an 8-bit LBP (the bits are packed [msb 01258763 lsb]).
This 8-bit LBP is then used as an index into the feature's (2^8 = 256)-bit LUT (the
), selecting a single bit. If this bit is 1, the feature is inconsistent with a face; if 0, it is consistent with a face. The appropriate weight (
) is then returned and added with the weights of all other features to produce an overall stage sum. This is then compared to
to determine whether the stage passed or failed.
If there's something else I didn't explain well enough I can clarify.