HOG Trilinear Interpolation of Histogram Bins

后端 未结 2 1391
-上瘾入骨i
-上瘾入骨i 2020-12-09 06:13

I am working on Histogram of Oriented Gradient(HOG) features and I am trying to implement the trilinear interpolation of histogram bins as described in Dalal\'s PhD thesis.

相关标签:
2条回答
  • 2020-12-09 06:38

    Lets first look at rectangular HOG. A picture is divided into a few tiles as shown on page 32. Page 46 shows an R-HOG descriptor in (f). Page 49 explains how the data is binned.

    I learned how to do 3D interpolation by reading Paul Burke's write-up: http://paulbourke.net/miscellaneous/interpolation/

    Sorry, I would have to generate my own images, in order to understand what is going on. It is certainly an interesting technique.

    0 讨论(0)
  • 2020-12-09 06:42

    Think of (x1, y1, z1) and (x2, y2, z2) as two points spanning a cube that surrounds the point (x,y,z) for which you want to interpolate a value of h. The set of eight points (x1, y1, z1), (x2, y1, z1), (x1, y2, z1), (x1, y1, z2), (x2, y2, z1), (x2, y1, z2), (x1, y2, z2), (x2, y2, z2) forms the complete cube. So trilinear interpolation between (x1, y1, z1) and (x2, y2, z2) actually means interpolation between the 8 points in the 3D histogram space surrounding the point you are interested in! Now to your questions:

    (x1, y1), (x2, y2) (and (x1,y2) and (x2, y1) represent the centers of bins in the (x,y) plane. In your case these would be the orientation vectors.

    z1 and z2 represent two bin levels in the orientation direction, as you say. Combined with the four points in the image plane this gives you a total of 8 bins.

    The bandwidth b=[bx, by, bz] is basically the distance between the centers of neighbouring bins in the x, y and z direction. In your case, with 8 bins in the x-direction and 64 pixels in that direction, 16 bins in the y direction and 128 pixels in the y direction:

    bx = 8 pixels
    by = 8 pixels
    

    This leaves bz, for which I actually need more data, because I don't know the full range of your gradient (i.e. lowest to highest possible value) but if that range is rg then:

    bz = rg/9
    

    In general, the bandwidth in any direction equals the full available range in that direction divided by the number of bins in that direction.

    For a good explanation of trilinear interpolation with pictures look at the link in whoplisp's answer.

    0 讨论(0)
提交回复
热议问题