问题
I'm using Tensorflow's Object Detection API, but get the following error when training:
InvalidArgumentError (see above for traceback): assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.47]
I get the error when I use any of the following:
- faster_rcnn_inception_resnet_v2_atrous_coco
- rfcn_resnet101_coco
But NOT when I use:
- ssd_inception_v2_coco
- ssd_mobilenet_v1_coco
My training images are a mixture of 300x300 and 450x450 pixels. I don't believe any of my bounding boxes are outside the image coordinates. Even if that's the case why would the last two models work but not the resnet models?
回答1:
The first two networks you mentioned seem to be using a value between 0 and 1 to define the position of the bounding boxes. For that reason, I was getting the same error.
I had to change the script to create the TF records, from something like this:
# Assuming `x` & `y` are floats with the coordinates of the top-left corner:
xmin = x
ymin = y
# Assuming `width` & `height` are floats with the size of the box
xmax = x + width
ymax = y + height
To something like this:
# Assuming `x` & `y` are floats with the coordinates of the top-left corner:
xmin = x / image_width
ymin = y / image_height
# Assuming `width` & `height` are floats with the size of the box
xmax = (x + width) / image_width
ymax = (y + height) / image_height
回答2:
After looking at my raw bounding box data, turns out there were a few random instances where the bounding box coordinates either had very large numbers or negative numbers (not sure how that happened to begin with). I deleted these and now I have no issue training any of the models.
回答3:
I faced the same problem. For me, when I was converting xml files to csv, I was indexing the values (width, height, xmin, xmax, ymin, ymax) from the xml tree. For this, I was assuming a particular xml structure for all records. This was the problem for me.
This is what I did for accessing value of xmin:
object.find('bndbox')[0].text
Instead, I accessed the values using key values. This solved it for me.
The correct way:
object.find('bndbox').find('xmin').text
来源:https://stackoverflow.com/questions/46135528/object-detection-api-assertion-failed-maximum-box-coordinate-value-is-larger-t