Prepare complex image for OCR

后端 未结 3 631
执念已碎
执念已碎 2020-12-30 07:29

I want to recognize digits from a credit card. To make things worse, the source image is not guaranteed to be of high quality. The OCR is to be realized through a neural net

3条回答
  •  情话喂你
    2020-12-30 07:56

    If it's at all possible, request that better lighting be used to capture the images. A low-angle light would illuminate the edges of the raised (or sunken) characters, thus greatly improving the image quality. If the image is meant to be analyzed by a machine, then the lighting should be optimized for machine readability.

    That said, one algorithm you should look into is the Stroke Width Transform, which is used to extract characters from natural images.

    Stroke Width Transform (SWT) implementation (Java, C#...)

    A global threshold (for binarization or clipping edge strengths) probably won't cut it for this application, and instead you should look at localized thresholds. In your example images the "02" following the "31" is particularly weak, so searching for the strongest local edges in that region would be better than filtering all edges in the character string using a single threshold.

    If you can identify partial segments of characters, then you might use some directional morphology operations to help join segments. For example, if you have two nearly horizontal segments like the following, where 0 is the background and 1 is the foreground...

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0
    0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0
    

    then you could perform a morphological "close" operation along the horizontal direction only to join those segments. The kernel could be something like

    x x x x x
    1 1 1 1 1
    x x x x x
    

    There are more sophisticated methods to perform curve completion using Bezier fits or even Euler spirals (a.k.a. clothoids), but preprocessing to identify segments to be joined and postprocessing to eliminate poor joins can get very tricky.

提交回复
热议问题