Does Tesseract's hOCR output really contain bounding boxes and confidence levels for each character?
问题 In the Tesseract FAQ they say you can: How can I get the coordinates and confidence of each character ? There are two options. If you would rather not get into programming, you can use Tesseract's hocr output format (read the Tesseract manual page for details). But when I created a sample hOCR output (it's an .html file), the bounding boxes and confidence levels were only available at the word level . Am I missing something here? I've added the sample input/output as illustration (the input