Use pytesseract OCR to recognize text from an image

后端 未结 6 763
北恋
北恋 2020-11-30 20:02

I need to use Pytesseract to extract text from this picture:

and the code:

from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
         


        
6条回答
  •  星月不相逢
    2020-11-30 20:32

    you only need grow up the size of picture by cv2.resize

    image = cv2.resize(image,(0,0),fx=7,fy=7)
    

    my picture 200x40 -> HZUBS

    resized same picture 1400x300 -> A 1234 (so, this is right)

    and then,

    retval, image = cv2.threshold(image,200,255, cv2.THRESH_BINARY)
    image = cv2.GaussianBlur(image,(11,11),0)
    image = cv2.medianBlur(image,9)
    

    and change parameters for enhance results

    Page segmentation modes:
      0    Orientation and script detection (OSD) only.
      1    Automatic page segmentation with OSD.
      2    Automatic page segmentation, but no OSD, or OCR.
      3    Fully automatic page segmentation, but no OSD. (Default)
      4    Assume a single column of text of variable sizes.
      5    Assume a single uniform block of vertically aligned text.
      6    Assume a single uniform block of text.
      7    Treat the image as a single text line.
      8    Treat the image as a single word.
      9    Treat the image as a single word in a circle.
     10    Treat the image as a single character.
     11    Sparse text. Find as much text as possible in no particular order.
     12    Sparse text with OSD.
     13    Raw line. Treat the image as a single text line,
                bypassing hacks that are Tesseract-specific.
    

提交回复
热议问题