How do I segment a document using Tesseract then output the resulting bounding boxes and labels

后端未结

关注

 6  2050

忘了有多久 2020-12-07 10:25

I\'m trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this \'out of the

6条回答

轮回少年 (楼主)

2020-12-07 11:00

With Tesseract 4.0.0, a command like tesseract source/dir/myimage.tiff target/directory/basefilename hocr will create a basefilename.hocr file with block-, paragraph-, line-, and word-level bounding boxes for the OCR'ed text. Even the command without the hocr config creates a text file with newlines between block-level text, but the hocr format is more explicit.

More config options here: https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...