Use pytesseract OCR to recognize text from an image

后端 未结 6 757
北恋
北恋 2020-11-30 20:02

I need to use Pytesseract to extract text from this picture:

and the code:

from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
         


        
6条回答
  •  隐瞒了意图╮
    2020-11-30 20:35

    Here is my small advancement with removing noise and arbitrary line within certain colour frequency range.

    import pytesseract
    from PIL import Image, ImageEnhance, ImageFilter
    
    im = Image.open(img)  # img is the path of the image 
    im = im.convert("RGBA")
    newimdata = []
    datas = im.getdata()
    
    for item in datas:
        if item[0] < 112 or item[1] < 112 or item[2] < 112:
            newimdata.append(item)
        else:
            newimdata.append((255, 255, 255))
    im.putdata(newimdata)
    
    im = im.filter(ImageFilter.MedianFilter())
    enhancer = ImageEnhance.Contrast(im)
    im = enhancer.enhance(2)
    im = im.convert('1')
    im.save('temp2.jpg')
    text = pytesseract.image_to_string(Image.open('temp2.jpg'),config='-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz -psm 6', lang='eng')
    print(text)
    

提交回复
热议问题