What does R-CNN actually do? Is it like using features extracted by CNN to detect classes in a specified window area? Is there any tensorflow implementation for this?
R-CNN is the daddy-algorithm for all the mentioned algos, it really provided the path for researchers to build more complex and better algorithm on top of it. I am trying to explain R-CNN and the other variants of it.
R-CNN consist of 3 simple steps:
Fast R-CNN was immediately followed R-CNN. Fast R-CNN is faster and better by the virtue of following points:
Intuitively it makes a lot of sense to remove 2000 conv layers and instead take once Convolution and make boxes on top of that.
One of the drawbacks of Fast R-CNN was the slow selective search algorithm and Faster R-CNN introduced something called Region Proposal network(RPN).
Here’s is the working of the RPN:
At the last layer of an initial CNN, a 3x3 sliding window moves across the feature map and maps it to a lower dimension (e.g. 256-d) For each sliding-window location, it generates multiple possible regions based on k fixed-ratio anchor boxes (default bounding boxes)
Each region proposal consists of:
For each of those boxes, we output whether or not we think it contains an object, and what the coordinates for that box are. This is what it looks like at one sliding window location:
The 2k scores represent the softmax probability of each of the k bounding boxes being on “object.” Notice that although the RPN outputs bounding box coordinates, it does not try to classify any potential objects: its sole job is still proposing object regions. If an anchor box has an “objectness” score above a certain threshold, that box’s coordinates get passed forward as a region proposal.
Once we have our region proposals, we feed them straight into what is essentially a Fast R-CNN. We add a pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor. In a sense, Faster R-CNN = RPN + Fast R-CNN.
Linking some Tensorflow implementation:
https://github.com/smallcorgi/Faster-RCNN_TF
https://github.com/CharlesShang/FastMaskRCNN
You can find a lot of implementation of Github.
P.S. I borrowed a lot of material from Joyce Xu Medium blog.