本blog为github上CharlesShang/TFFRCNN版源码解析系列代码笔记
---------------个人学习笔记---------------
----------------本文作者吴疆--------------
------点击此处链接至博客园原文------
1.proposal_target_layer(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, _num_classes)代码逻辑
赋值all_rois = rpn_rois,剔除gt_boxes中的gt_hardboxes得到gt_easyboxes--->
扩充all_rois(None,5) 第1列为全0batch_ind:rpn_rois+gt_easyboxes+jittered_gt_boxes三个部分,jittered_gt_boxes由gt_easyboxes抖动而来(调用_jitter_gt_boxes(...)函数,未知意义)未知扩充all_rois的意义???(Include ground-truth boxes in the set of candidate rois)猜想是参与训练的proposals中也应包含gt box,而不仅仅是来源于RPN得到的proposals,有利于RCNN subnet网络训练--->
计算rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images = 128/1 = 128 (RPN训练时是256个anchors,RCNN subnet训练时是128个proposals)--->
调用_sample_rois(...)函数得到labels(128*1), rois(128*5,第1列为全0batch_ind), 目标回归值bbox_targets(128*4k,K为类别数,默认PASCAL VOC数据集为21), bbox_inside_weights(128*4k)--->
新建变量bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)--->
返回rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
# 返回采样的128proposals相关信息,即rois(128*5,第1列全0batch_ind);labels(128*1)对应gt类别标签,负样本proposals对应label为0;
# bbox_targets(128*4K)proposals回归目标值,对应gt 类别位置有值,其余位置全0;
# bbox_inside_weights对应gt 类别位置有值[1,1,1,1]其余位置全0;
# bbox_outside_weights对应gt 类别位置有值[1,1,1,1]其余位置全0;
def proposal_target_layer(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, _num_classes):
"""
Assign object detection proposals to ground-truth targets. Produces proposal
classification labels and bounding-box regression targets.
Parameters
----------
rpn_rois: (1 x H x W x A, 5) [0, x1, y1, x2, y2] # RPN最终生成的rois数量少于1 x H x W x A 训练阶段2000个 测试阶段300个
gt_boxes: (G, 5) [x1 ,y1 ,x2, y2, class] int
gt_ishard: (G, 1) {0 | 1} 1 indicates hard
dontcare_areas: (D, 4) [ x1, y1, x2, y2]
_num_classes
----------
Returns
----------
rois: (1 x H x W x A, 5) [0, x1, y1, x2, y2]
labels: (1 x H x W x A, 1) {0,1,...,_num_classes-1}
bbox_targets: (1 x H x W x A, K x4) [dx1, dy1, dx2, dy2]
bbox_inside_weights: (1 x H x W x A, Kx4) 0, 1 masks for the computing loss
bbox_outside_weights: (1 x H x W x A, Kx4) 0, 1 masks for the computing loss
"""
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
# (i.e., rpn.proposal_layer.ProposalLayer), or any other source
all_rois = rpn_rois
# TODO(rbg): it's annoying that sometimes I have extra info before
# and other times after box coordinates -- normalize to one format
# Include ground-truth boxes in the set of candidate rois
# 默认TRAIN.PRECLUDE_HARD_SAMPLES = True
if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0:
assert gt_ishard.shape[0] == gt_boxes.shape[0]
gt_ishard = gt_ishard.astype(int)
# 剔除gt_ishard box得到gt_easyboxes,怎么和anchor_target_layer_tf.py中处理不一样???
gt_easyboxes = gt_boxes[gt_ishard != 1, :]
else:
gt_easyboxes = gt_boxes
"""
add the ground-truth to rois will cause zero loss! not good for visuallization
"""
jittered_gt_boxes = _jitter_gt_boxes(gt_easyboxes)
zeros = np.zeros((gt_easyboxes.shape[0] * 2, 1), dtype=gt_easyboxes.dtype)
# 由all_rois、含batch_ind为0的gt_easyboxes、jittered_gt_boxes组成all_rois???
# all_rois的意义何在???
all_rois = np.vstack((all_rois, \
np.hstack((zeros, np.vstack((gt_easyboxes[:, :-1], jittered_gt_boxes[:, :-1]))))))
# batch_ind均必须为0!!!
# Sanity check: single batch only
assert np.all(all_rois[:, 0] == 0), \
'Only single item batches are supported'
num_images = 1
# 默认TRAIN.BATCH_SIZE = 128,与TRAIN.RPN_BATCHSIZE = 256有区别!!!
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
# 默认TRAIN.FG_FRACTION = 0.25(1:3),与TRAIN.RPN_FG_FRACTION = 0.5(1:1)有区别!!!
fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))
# Sample rois with classification labels and bounding box regression targets
labels, rois, bbox_targets, bbox_inside_weights = _sample_rois(
all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image,
rois_per_image, _num_classes)
# _count = 1
# if DEBUG:
# if _count == 1:
# _fg_num, _bg_num = 0, 0
# print 'num fg: {}'.format((labels > 0).sum())
# print 'num bg: {}'.format((labels == 0).sum())
# _count += 1
# _fg_num += (labels > 0).sum()
# _bg_num += (labels == 0).sum()
# print 'num fg avg: {}'.format(_fg_num / _count)
# print 'num bg avg: {}'.format(_bg_num / _count)
# print 'ratio: {:.3f}'.format(float(_fg_num) / float(_bg_num))
rois = rois.reshape(-1, 5)
labels = labels.reshape(-1, 1)
bbox_targets = bbox_targets.reshape(-1, _num_classes*4)
bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes*4)
bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)
return rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
2._sample_rois(all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, num_classes)代码逻辑
调用bbox_overlaps(...)函数(utils/cython_bbox.so中)计算all_rois[:,1:5]和gt_boxes[:, :4]的IOU--->
对于all_rois中各个roi,得到与gt boxes获得max IOU对应的gt box索引gt_assignment,并得到对应max IOU值max_overlaps,利用gt_assignment得到各roi的gt类别标签labels--->
剔除难例:调用bbox_overlaps(...)函数(utils/cython_bbox.so中)计算all_rois[:,1:5]和gt_hardboxes[:, :4]的IOU,对于all_rois中各个roi,与gt_hardboxes获得max IOU值>0.5将被剔除--->
剔除dontcare areas:调用bbox_intersections(...)函数(utils/cython_bbox.so中)计算dontcare_areas和all_rois[:,1:5]的交集,对于all_rois的各个roi,其与所有dontcare_areas交集和>0.5将被剔除--->
正样本proposals采样:与gt box的max IOU > 0.5的proposals为正样本,随机采样32个proposals,不足32个以负样本补足(正、负样本比例为1:3,样本总数为128)--->
负样本proposals采样:与gt box的max IOU介于(0.1,0.5)的proposals为负样本,随机采样96个proposals---> (感觉采样会取到gt_boxes本身?还是不明白all_rois的用意)
更新labels,其shape为(128,1),负样本proposals对应label置为0,更新rois,其shape为(128,5)--->
调用_compute_targets(...)函数计算规范化后的proposals回归目标值bbox_target_data,其shape为(128,5),第1列为类别信息,第2—4列为proposals的规范化的回归目标值--->
调用_get_bbox_regression_labels(...)函数扩充128个proposals的bbox_target_data(128*5,第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值,其余全0,建立bbox_inside_weights (128*(4K))对应类别位置值为1.0 1.0 1.0 1.0,其余全0--->
返回labels(128*1), rois(128*5,第1列为全0batch_ind), 目标回归值bbox_targets(128*4k,K为类别数,默认PASCAL VOC数据集为21), bbox_inside_weights(128*4k),被proposal_target_layer(...)调用
# 得到(默认128)个正、负样本proposals采样 rois、对应gt类别标签labels和回归目标值bbox_targets、bbox_inside_weights
def _sample_rois(all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, num_classes):
"""
Generate a random sample of RoIs comprising foreground and background examples.
"""
# overlaps: R x G,R表示all_rois中roi的数量,G表示gt_box的数量
overlaps = bbox_overlaps(
np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
# 对于all_rois中各个roi,与gt boxes获得max IOU对应的gt box索引
gt_assignment = overlaps.argmax(axis=1) # R
# 对应的max IOU值
max_overlaps = overlaps.max(axis=1) # R
# 对应的类别label
labels = gt_boxes[gt_assignment, 4]
# 剔除难例
# preclude hard samples
ignore_inds = np.empty(shape=(0), dtype=int)
# 默认TRAIN.PRECLUDE_HARD_SAMPLES = True
if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0:
gt_ishard = gt_ishard.astype(int)
gt_hardboxes = gt_boxes[gt_ishard == 1, :]
if gt_hardboxes.shape[0] > 0:
# R x H
hard_overlaps = bbox_overlaps(
np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
np.ascontiguousarray(gt_hardboxes[:, :4], dtype=np.float))
# 对于all_rois中各个roi,与gt_hardboxes获得max IOU值
hard_max_overlaps = hard_overlaps.max(axis=1) # R
# hard_gt_assignment = hard_overlaps.argmax(axis=0) # H
# 默认TRAIN.FG_THRESH = 0.5
ignore_inds = np.append(ignore_inds, \
np.where(hard_max_overlaps >= cfg.TRAIN.FG_THRESH)[0])
# if DEBUG:
# if ignore_inds.size > 1:
# print 'num hard: {:d}:'.format(ignore_inds.size)
# print 'hard box:', gt_hardboxes
# print 'rois: '
# print all_rois[ignore_inds]
# 剔除dontcare areas
# preclude dontcare areas
if dontcare_areas is not None and dontcare_areas.shape[0] > 0:
# intersec shape is D x R
intersecs = bbox_intersections(
np.ascontiguousarray(dontcare_areas, dtype=np.float), # D x 4
np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float) # R x 4
)
# 对于all_rois的各个roi,计算其与所有dontcare_areas交集和
intersecs_sum = intersecs.sum(axis=0) # R x 1
# 默认TRAIN.DONTCARE_AREA_INTERSECTION_HI = 0.5
ignore_inds = np.append(ignore_inds, \
np.where(intersecs_sum > cfg.TRAIN.DONTCARE_AREA_INTERSECTION_HI)[0])
# if ignore_inds.size >= 1:
# print 'num dontcare: {:d}:'.format(ignore_inds.size)
# print 'dontcare box:', dontcare_areas.astype(int)
# print 'rois: '
# print all_rois[ignore_inds].astype(int)
# Select foreground RoIs as those with >= FG_THRESH overlap
# 默认TRAIN.FG_THRESH = 0.5
# max_overlaps:对于all_rois中各个roi,与gt boxes获得max IOU值
# 与gt box的max IOU > 0.5的proposals为正样本
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
# np.setdiff1d()函数返回存在于fg_inds但不存在于ignore_inds的元素组成的元组
fg_inds = np.setdiff1d(fg_inds, ignore_inds)
# Guard against the case when an image has fewer than fg_rois_per_image
# foreground RoIs
# 默认fg_rois_per_image = 128 * 0.25 = 32 !!!
fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size)
# Sample foreground regions without replacement
# 前景(正样本)proposal采样!!!
if fg_inds.size > 0:
fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
# 默认TRAIN.BG_THRESH_HI = 0.5、TRAIN.BG_THRESH_LO = 0.1
# 与gt box的max IOU介于(0.1, 0.5)的proposals为负样本
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
bg_inds = np.setdiff1d(bg_inds, ignore_inds)
# Compute number of background RoIs to take from this image (guarding
# against there being fewer than desired)
# 默认rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images = 128/1 = 128
# 正样本proposals不足32个则以负样本proposals补足
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
# Sample background regions without replacement
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, size=bg_rois_per_this_image, replace=False)
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays:
labels = labels[keep_inds]
# Clamp labels for the background RoIs to 0
# 负样本proposals label置为0
labels[fg_rois_per_this_image:] = 0
# 感觉会取到gt_boxes本身??????还是不明白all_rois的用意
# 采样正、负样本proposals共128个
rois = all_rois[keep_inds]
# gt_assignment:对于all_rois中各个roi,与gt boxes获得max IOU对应的gt box索引
# bbox_target_data.shape = (128, 5) 第1列为类别信息,第2—4列为proposals的规范化的回归目标值
bbox_target_data = _compute_targets(
rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)
# bbox_target_data (1 x H x W x A, 5)
# bbox_targets <- (1 x H x W x A, K x 4)
# bbox_inside_weights <- (1 x H x W x A, K x 4)
bbox_targets, bbox_inside_weights = \
_get_bbox_regression_labels(bbox_target_data, num_classes)
# labels:128 * 1
# rois: 128 * 5 第一列为全0batch_ind
# bbox_targets: 128 * (4K) K表示类别
# bbox_inside_weights: 128 * (4K) K表示类别
return labels, rois, bbox_targets, bbox_inside_weights
3._get_bbox_regression_label(bbox_target_data,num_classes)
扩充128个proposals的bbox_target_data(128*5,第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值,其余全0,建立bbox_inside_weights (128*(4K))对应类别位置值为1.0 1.0 1.0 1.0,其余全0
# 扩充128个proposals的bbox_target_data(128*5,第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值,其余全0
# 建立bbox_inside_weights (128*(4K))对应类别位置值为1.0 1.0 1.0 1.0,其余全0
def _get_bbox_regression_labels(bbox_target_data, num_classes):
"""
Bounding-box regression targets (bbox_target_data) are stored in a compact form N x (class, tx, ty, tw, th)
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets).
Returns:
bbox_target (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
# 各个proposal对应的gt 类别
clss = bbox_target_data[:, 0]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
# 取出gt 类别非0的proposal的索引
inds = np.where(clss > 0)[0]
for ind in inds:
cls = int(clss[ind])
start = 4 * cls
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
# 默认TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
4._compute_targets(ex_rois,gt_rois,labels)
由proposals(即rois[:, 1:5])和对应gt_box计算proposals的目标回归值,并利用cfg.TRAIN.BBOX_NORMALIZE_MEANS和cfg.TRAIN.BBOX_NORMALIZE_STDS对其规范化,返回的bbox_target_data.shape = (128, 5) 第一列为类别信息,第2—4列为proposals的规范化的回归目标值,被_sample_rois(...)函数调用
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 4
targets = bbox_transform(ex_rois, gt_rois)
# TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True
# TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)、TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)!!!
# 利用cfg.TRAIN.BBOX_NORMALIZE_MEANS和cfg.TRAIN.BBOX_NORMALIZE_STDS对proposals回归目标值进行规范化!!!
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
# Optionally normalize targets by a precomputed mean and stdev
targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
/ np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
# 返回的bbox_target_data.shape = (128, 5) 第一列为类别信息,第2—4列为proposals的规范化的回归目标值
return np.hstack(
(labels[:, np.newaxis], targets)).astype(np.float32, copy=False)
5._jitter_gt_boxes(gt_boxes,jitter=0.05)
传入参数为gt_easyboxes,为其左上、右下坐标添加偏置,横坐标添加基于宽度的偏置、纵坐标添加基于高度的偏置,抖动系数jitter=0.05,未知意义,被proposal_target_layer(...)调用
# 抖动、传入参数gt_easyboxes、抖动系数jitter=0.05
# 为gt_easyboxes左上、右下坐标添加偏置,横坐标添加基于宽度的偏置、纵坐标添加基于高度的偏置
def _jitter_gt_boxes(gt_boxes, jitter=0.05):
"""
jitter the gtboxes, before adding them into rois, to be more robust for cls and rgs
gt_boxes: (G, 5) [x1 ,y1 ,x2, y2, class] int
"""
jittered_boxes = gt_boxes.copy()
ws = jittered_boxes[:, 2] - jittered_boxes[:, 0] + 1.0
hs = jittered_boxes[:, 3] - jittered_boxes[:, 1] + 1.0
width_offset = (np.random.rand(jittered_boxes.shape[0]) - 0.5) * jitter * ws
height_offset = (np.random.rand(jittered_boxes.shape[0]) - 0.5) * jitter * hs
jittered_boxes[:, 0] += width_offset
jittered_boxes[:, 2] += width_offset
jittered_boxes[:, 1] += height_offset
jittered_boxes[:, 3] += height_offset
return jittered_boxes