Anchor boxes in yoloV3

假装没事ソ 提交于 2021-02-11 05:53:52

问题


We're struggling to get our Yolov3 working for a 2 class detection problem (the size of the objects of both classes are varying and similar, generally small, and the size itself does not help differentiating the object type). We think that the training is not working due to some problem with the anchor boxes, since we can clearly see that depending on the assigned anchor values the yolo_output_0, yolo_output_1 or yolo_output_2 fail to return a loss value different to 0 (for xy, hw and class components). However, even if there are multiple threads about anchor boxes we cannot find a clear explanation about how they are assigned specifically for YOLOv3.

So far, what we're doing to know the size of the boxes is: 1- We run a clustering method on the normalized ground truth bounding boxes (according to the original size of the image) and get the centroids of the clusters. In our case, we have 2 clusters and the centroids are something about (0.087, 0.052) and (0.178, 0.099). 2- Then we rescale the values according to the rescaling we are going to apply to the images during training. We are working with rectangular images of (256, 416), so we get bounding boxes of (22,22) and (46,42). Note that we have rounded the values as we have read that yoloV3 expects actual pixel values. 3- Since we compute anchors at 3 different scales (3 skip connections), the previous anchor values will correspond to the large scale (52). The anchors for the other two scales (13 and 26) are calculated by dividing the first ancho /2 and /4.

We are not even sure if we are correct up to this point. If we look at the code in the original models.py what we see is the following:

yolo_anchors = np.array([(10, 13), (16, 30), (33, 23), (30, 61), (62, 45), (59, 119), (116, 90), (156, 198), (373, 326)], np.float32) / 416 yolo_anchor_masks = np.array([[6, 7, 8], [3, 4, 5], [0, 1, 2]])

So, there are 9 anchors, which are ordered from smaller to larger and the, the anchor_masks determine if the resolution at which they are used, is this correct? In fact, our first question is, are they 9 anchors or 3 anchors at 3 different scales? If so, how are they calculated? we know about the gen_anchors script in yolo_v2 and a similar script in yolov3, however we don't know if they calculate 9 clusters and then order them according to the size or if they follow a procedure similar to ours.

Additionally, we don’t fully understand why these boxes are divided by 416 (image size). This would mean having anchors that are not integers (pixels values), which was stated was necessary for yolov3.

We would be really grateful if someone could provide us with some insight into these questions and help us better understanding how yoloV3 performs.

Thanks and regards Karen

来源:https://stackoverflow.com/questions/61060088/anchor-boxes-in-yolov3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!