Does Tesseract neglect any nontext area in a scanned document?

纵饮孤独 提交于 2019-12-21 05:57:10

问题


I'm using Tesseract but I don't know whether it neglects any nontext area and targets text only. Do I have to remove any nontext area as a preprocessing step for better output?


回答1:


Tesseract has a pretty good algorithm to detect text, but it will eventually give false-positive matches.

Ideally, you would pre-process the image before submitting it to tesseract. Some time ago I engaged in a similar task, so I suggest you take a look at the following material:

  • OpenCV C++/Obj-C: Detecting a sheet of paper / Square Detection

  • Executing cv::warpPerspective for a fake deskewing on a set of cv::Point

  • Rotate cv::Mat using cv::warpAffine offsets destination image

  • Affine Transform, Simple Rotation and Scaling or something else entirely?



来源:https://stackoverflow.com/questions/10193816/does-tesseract-neglect-any-nontext-area-in-a-scanned-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!