【论文笔记 Detection】（2017 ICCV）Deformable Convolutional Network

Abstract

I’ve writen many paper reviews in Chinese before, so to get more fun, English will be used for latter paper reviews gradually.

A few days ago, we have reviewed STN. We all know that becaus of Pooling layer, CNN has spatial invariance(such as translation invariance, rotation invariance). And as the larger the pooling kernel and conv kernel are, invariance of CNN will get more and more powerful. But at the same time, the larger the pooling kernel and conv kernel are, CNN wil lost more and more local information.
As a result, downsampling ratio needs to be adjusted according to defferent datasets.

如果降采样太少，那么空间不变性太弱，泛化性不好；如果降采样太多，那么局部信息损失太多，对模型结果影响太大。

However pooling layer’s spatial invariance is not enough for natural scene. Anamorphose of images includes rotation, distortion, scaling, aliasing, etc. STN proposed a spatial transformer, which can learnably adjust spatial feature map.

Comvolutional/Regular convolution operates a pre-defined rectangular grid , and the size of grid usually are 33 and 55. However object which we need to be classified or detected can be deformable or occluded within the image.

In DCN, the grid is deformable, and each grid point is moved by a learnable offset. By use of these kind of deformable convolution, the paper proposed Deformable ROI Pooling. By using these two new modules, DCN imporves the accuracy of DeepLab, Faster RCNN, R-FCN, and FPN ,etc.