Is it normal that tesseract does not recognize this word in this image?

独自空忆成欢 提交于 2019-12-20 04:21:11

问题


I need to extract words from small images like this:

I am using tesseract from the command line with spanish language option, like this:

tesseract category.png -l spa -psm 7 category.txt

I think that this text must be easy to parse by the OCR but the word is not recognized. I am using -l spa for spanish language and -psm 7 because the image has got only line (anyway if I don't use -psm parameter the result is the same).

This is the result: s…"…

I am using this build with the lang package: http://domasofan.spdns.eu/tesseract/ (official source cited in github)


回答1:


Tesseract seems to really struggle when scanning low resolution characters.

Try to scan this image. I enhanced its resolution by 400 percent (I think 200 percent is possible for scanning, but lets try 400%), did a great amount of blurring and did threshold of ~140 value. Try scanning this one, the results should be much better and I hope this satisfy you. If you need to do that programmatically, write in comments what is unclear for you, I will provide you some additional information.



来源:https://stackoverflow.com/questions/36677638/is-it-normal-that-tesseract-does-not-recognize-this-word-in-this-image

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!