Tesseract thinks my 1's are 7's

你说的曾经没有我的故事 提交于 2019-12-06 12:46:07

问题


It seems like this is probably a common issue with ocr. Is there a way to tell tesseract that my 1's are actually 1's?

Hopefully without changing my 7's into 1's in the process.

Note: these are scanned documents and I have no idea what font was used.


回答1:


if "tesseract" is trainable, try to train it on the font manually. It should solve the problem.

There is another possible solution. Make a small valdiation module after "tesseracting". For all 1s and 7s, double check them using intensity based method. For example try to find corners(feature points) on it and apply KLT with 1 and 7 template and see which one got more positive tracking result. This method is costy but since you will try it on just 2 templates and so small, I do not think it gonna be a big performance decreasing.

if both solution are not possible , try to solve it using post-processing. For example, if it is a student age it would not be 78, it is 18 and so on. However this method is so bad and not a solution at all. but when no other solution is possible you have to do something like it.



来源:https://stackoverflow.com/questions/33624784/tesseract-thinks-my-1s-are-7s

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!