How do I segment a document using Tesseract then output the resulting bounding boxes and labels

后端 未结 6 2049
忘了有多久
忘了有多久 2020-12-07 10:25

I\'m trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this \'out of the

6条回答
  •  温柔的废话
    2020-12-07 11:07

    The HOCR individual character step is now available in Tesseract since 4.1. Once the installation check, use :

    tesseract {image file} {output name} -c tessedit_create_hocr=1 -c hocr_char_boxes=1

提交回复
热议问题