Does Tesseract's hOCR output really contain bounding boxes and confidence levels for each character?

后端 未结 2 725
伪装坚强ぢ
伪装坚强ぢ 2020-12-17 01:03

In the Tesseract FAQ they say you can:

How can I get the coordinates and confidence of each character?

There are two options. If

2条回答
  •  无人及你
    2020-12-17 01:22

    This now seems to be available in Tesseract 4.x.

    See my answer at:

    https://stackoverflow.com/a/57766860/1021819

    Set hocr_char_boxes to 1 in your config file. Or, at the command line, your updated command would be:

    tesseract [Image name] outputbase --oem 1 -l eng --psm 8 -c hocr_char_boxes=1 hocr Note the hocr output option and look in that file for ..._wconf, e.g.

    Let me know if this works for you, otherwise I'll just delete the answer.

    Source: https://github.com/tesseract-ocr/tesseract/issues/1465#issuecomment-513139976

提交回复
热议问题