How to extract account number in cheque/check images

倖福魔咒の 提交于 2020-12-31 17:53:05

问题


I am working on a task to extract the account number from cheque images. My current approach can be divided into 2 steps

  1. Localize account number digits (Printed digits)
  2. Perform OCR using OCR libraries like Tesseract OCR

The second step is straight forward assuming we have properly localized the account number digits

I tried to localize account number digits using OpenCV contours methods and using MSER (Maximally stable extremal regions) but didn’t get useful results. It’s difficult to generalize pattern because

  • Different bank cheques have variations in template
  • Account number position is not fixed

How can we approach this problem. Do I have to look for some deep learning based approaches.

Sample Images


回答1:


Assuming the account number has the unique purple text color, we can use color thresholding. The idea is to convert the image to HSV color space then define a lower/upper color range and perform color thresholding using cv2.inRange(). From here we filter by contour area to remove small noise. Finally we invert the image since we want the text in black with the background in white. One last step is to Gaussian blur the image before throwing it into Pytesseract. Here's the result:

Result from Pytesseract

30002010108841

Code

import numpy as np
import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([103,79,60])
upper = np.array([129,255,255])
mask = cv2.inRange(hsv, lower, upper)

cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 10:
        cv2.drawContours(mask, [c], -1, (0,0,0), -1)

mask = 255 - mask
mask = cv2.GaussianBlur(mask, (3,3), 0)

data = pytesseract.image_to_string(mask, lang='eng',config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.waitKey()


来源:https://stackoverflow.com/questions/58875863/how-to-extract-account-number-in-cheque-check-images

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!