How to process and extract text from image

谁说胖子不能爱 提交于 2019-12-06 06:18:48

Preprocessing to clean the image before performing text extraction can help. Here's a simple approach

  • Convert image to grayscale and sharpen image
  • Adaptive threshold
  • Perform morpholgical operations to clean image
  • Invert image

First we convert to grayscale then sharpen the image using a sharpening kernel

Next we adaptive threshold to obtain a binary image

Now we perform morphological transformations to smooth the image

Finally we invert the image

import cv2
import numpy as np

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(gray, -1, sharpen_kernel)
thresh = cv2.threshold(sharpen, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)
result = 255 - close

cv2.imshow('sharpen', sharpen)
cv2.imshow('thresh', thresh)
cv2.imshow('close', close)
cv2.imshow('result', result)
cv2.waitKey()

Try this

    from PIL import image
    import PIL.image
    from pytesseract import image_to_string
     import pytesseract

     pytesseract.pytesseract.tesseract_cmd = '_________' 
          #tesseract executable path -- if not added to 
            environment path, if added ignore

       Output = 
         pytesseract.image_to_string(PIL.image.open('image 
            wants to read'). convert ("RBG"), Lang='eng')
       Print (output)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!