PyTesseract OCR unable to read digits from a simple image

前端 未结 1 750
广开言路
广开言路 2020-12-06 23:27

I\'m trying to get PyTesseract OCR to read digits from this simple and well cropped Image, but for some reason it\'s just not able to do this.

from PIL impor         


        
1条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-06 23:35

    When performing OCR, it is important to prepossess the image so that the desired foreground text is in black with the background in white. To do this, we can use OpenCV to Otsu's threshold the image and obtain a binary image. We then apply a slight Gaussian blur to smooth the image before throwing it into Pytesseract. We use --psm 6 config to treat the image as a single uniform block of text. See here for more configuration options.


    Here's the preprocessed image and the result from Pytesseract

    PRACTICE ACCOUNT
    $9,047.26~ i
    

    Code

    import cv2
    import pytesseract
    
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    
    image = cv2.imread('1.png', 0)
    thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    thresh = cv2.GaussianBlur(thresh, (3,3), 0)
    data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
    print(data)
    
    cv2.imshow('thresh', thresh)
    cv2.waitKey()
    

    0 讨论(0)
提交回复
热议问题