Recognize simple digits with pytesser

喜欢而已 提交于 2019-12-23 20:52:54

问题


I'm learning OCR using PyTesser and Tesseract. As the first milestone, I want to write a tool to recognize captcha that simply consists of some digits. I read some tutorials and wrote such a test program.

from pytesser.pytesser import *
from PIL import Image, ImageFilter, ImageEnhance

im = Image.open("test.tiff")
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
text = image_to_string(im)
print "text={}".format(text)

I tested my code with the image below. But the result is 2(T?770. And I've tested some other similar images as well, in 80% case the results are incorrect.

I'm not familiar with imaging processing. I've two questions here:

  1. Is it possible to tell PyTesser to guess digits only?

  2. I think the image is quite easy for human to read. If it is so difficult for PyTesser to read digits only image, is there any alternatives can do a better OCR?

Any hints are very appreciated.


回答1:


I think your code is quite okay. It can recognize 207770. The problem is at pytesser installation. The Tesseract in pytesser is out-of-date. You'd download a most recent version and overwrite corresponding files. You'd also edit pytesser.py and change

tesseract_exe_name = 'tesseract'

to

import os.path
tesseract_exe_name = os.path.join(os.path.dirname(__file__), 'tesseract')


来源:https://stackoverflow.com/questions/24247813/recognize-simple-digits-with-pytesser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!