Remove background color in image processing for OCR

前端 未结 6 2108
借酒劲吻你
借酒劲吻你 2021-02-01 10:50

I am trying to remove background color so as to improve the accuracy of OCR against images. A sample would look like below:

6条回答
  •  长情又很酷
    2021-02-01 11:37

    You can do this using GIMP (or any other image editing tool).

    1. Open your image
    2. Convert to grayscale
    3. Duplicate the layer
    4. Apply Gaussian blur using a large kernel (10x10) to the top layer
    5. Calculate the image difference between the top and bottom layer
    6. Threshold the image to yield a binary image

    Blurred image:

    enter image description here

    Difference image:

    enter image description here

    Binary:

    enter image description here

    If you're doing it as a once-off, GIMP is probably good enough. If you expect to do this many times over, you could probably write an imagemagick script or code up your approach using something like Python and OpenCV.

    Some problems with the above approach:

    • The purple text (CENTURY) gets lost because it isn't as contrasting as the other text. You could work your way around it by thresholding different parts of the image differently, or by using local histogram manipulation methods

提交回复
热议问题