Train Tesseract for specific words - possible?

问题

I want to use Tesseract to extract about 10-20 keywords from a document. The document will contain all English characters/words. What I am interested in is something like "Age: 23". Here Age is the keyword I am interested in and want to extract the 23 (the value for that) as well.

The first approach that comes in my mind is to extract the whole page into text and then look for keywords in the recognized text. But in terms of training the tesseract, is there a better approach if I know the keywords, which might result in a better accuracy?

I am more or less aware of the limitations of Tesseract OCR. Trying to maximize within that limitations. Thanks for all your expert advice.

回答1:

Try bazaar matching pattern in Tesseract.

来源：https://stackoverflow.com/questions/18674061/train-tesseract-for-specific-words-possible

标签

ocr

tesseract

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!