c# OCR can't recognize digits (tesseract 2)

倾然丶 夕夏残阳落幕 提交于 2019-12-06 05:31:13

问题


I'm trying to extract digits from the following:

It fails, I get a ~ in return. I'm using google's tesseract 2, using C# (open source c# wrapper) and now I'm wondering, is this image too crappy to be used for OCR?

Because imho the digits are straight clear.

Do you have any other OCR engine in mind that would nail this down?

EDIT

I've also tried with Asprise OCR (http://asprise.com/product/ocr/selector.php) but it fails to parse the image too...


回答1:


I suggest resizing. I zoomed this page to 200% in IE, Took a screenshot, printed it to PDF and imported it into my program that uses tessnet. Tess nailed it! Unless I read the #s wrong :-)

Although confidence = 140 (under 100 is preferred if you wondered). Of course When i tried the original size, I didn't get ~; I got about 1/2 the #s right, a bunch of letters, and other garbage. Not good enough, but better.

t2 seems to like images a certain size.

My program does processing to get that to work. Suggest using .net GDI+ for converting to 32 bit, resizing with Interpolation mode High Quality Bicubic. This seems to 'fill in the gaps' a bit.

Play with sizes that work - I have found, too big, or too small, and tesseract performs differently.

Both issues are preprocessing, that's easy and you'd thing tesseract would try; however, I know how to resize and interpolate; I don't know how to OCR! So I am willing to settle.




回答2:


Your image's resolution is too low -- 96 DPI, perhaps it is a screenshot. Rescale it to 300 DPI, and tessnet2 should be able to recognize it.



来源:https://stackoverflow.com/questions/5475256/c-sharp-ocr-cant-recognize-digits-tesseract-2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!