Detect white characters on black background using Tesseract

徘徊边缘 提交于 2019-12-17 16:18:14

问题


I'm completely new to Tesseract OCR. This problem might be simple but I can't seem to find the answer using Google.

Basically, I have an image that contains two parts: the first part, which is at the top of the image, has a black background with texts in white color; the second part, which is at the bottom of the image, has white background with texts in black color.

I ran tesseract on the image, which correctly recognized all characters in the bottom part, but none in the top part. I am sure that the characters on the top part is very clear and should be easy to recognize by Tesseract. The only difference is that it has black background.

Is there a way to use Tesseract to recognize texts in both black and white background at the same time?


回答1:


A paper by T. Kasar, J. Kumar, and A. G. Ramakrishnan describes one solution to the problem: "Font and Background Color Independent Text Binarization". The paper can be found here. There is an implementation of the algorithm by Jason Funk. His implementation can be found here. I have had some success with the algorithm. I think this type of solution is what you are looking for.

You might also find it helpful to review this recently asked question on background removal (OpenCV for OCR: How to compute thresholding levels for gray image OCR) and its answer. You may be able separate regions of interest by background color and then hand each region to tesseract for processing. Alternatively, post binarization you could invert the 8x8 pixel regions (described in answer above) in the black background portion of the image (or vice versus) to create a uniform background.

Finally, you may find some useful information by searching for solutions to the number plate recognition problem (or license plates). Many number plates (license plates) have background images or lighting artifacts that can interfere with recognition. The more general problem is background removal.



来源:https://stackoverflow.com/questions/39002966/detect-white-characters-on-black-background-using-tesseract

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!