In computer vision and object detection, the common evaluation method is mAP. What is it and how is it calculated?
I think the important part here is linking how object detection can be considered the same as the standard information retrieval problems for which there exists at least one excellent description of average precision.
The output of some object detection algorithm is a set of proposed bounding boxes, and for each one, a confidence and classification scores (one score per class). Let's ignore the classification scores for now, and use the confidence as input to a threshold binary classification. Intuitively, the average precision is an aggregation over all choices for the threshold/cut-off value. But wait; in order to calculate precision, we need to know if a box is correct!
This is where it gets confusing/difficult; as opposed to typical information retrieval problems, we actually have an extra level of classification here. That is, we can't do exact matching between boxes, so we need to classify if a bounding box is correct or not. The solution is to essentially do a hard-coded classification on the box dimensions; we check if it sufficiently overlaps with any ground truth to be considered 'correct'. The threshold for this part is chosen by common sense. The dataset you are working on will likely define what this threshold for a 'correct' bounding box is. Most datasets just set it at 0.5 IoU and leave it at that (I recommend doing a few manual IoU calculations [they're not hard] to get a feel for how strict IoU of 0.5 actually is).
Now that we have actually defined what it means to be 'correct', we can just use the same process as information retrieval.
To find mean average precision (mAP), you just stratify your proposed boxes based on the maximum of the classification scores associated with those boxes, then average (take the mean) of the average precision (AP) over the classes.
TLDR; make the distinction between determining if a bounding box prediction is 'correct' (extra level of classification) and evaluating how well the box confidence informs you of a 'correct' bounding box prediction (completely analogous to information retrieval case) and the typical descriptions of mAP will make sense.
It's worth noting that Area under the Precision/Recall curve is the same thing as average precision, and we are essentially approximating this area with the trapezoidal or right-hand rule for approximating integrals.