Tesseract improvements and image pre-processing steps

喜欢而已 提交于 2019-12-25 03:24:40

问题


I am working on Tesseract library and below is the input for the Tesseract,

At the initial step of implementation I have used only the "MRZ" zone of the ID card. But the actual intention is to scan the entire document and get all the texts in the ID card.

I have gone through this document and to improve quality of Tesseract th first step is the image should be 300 dpi.

1) How to convert the captured camera image in ios to 300 dpi?

2) What should be the best contrast and brigtness level for Tesseract to give best outputs?

3) Is there anyother pre-processing step that I can apply to an image to get good accuracy?

4) For better accuracy what is the recommended image resolution?

5) I have used "int tesseract::TESSDLL_API::MeanTextConf" to get the confidence score. With this confidence score for each character is there a possibility that I can decide if the confidence score is above some percentage then the recognized character is accurate? If I am wrong can you please explain the use of "MeanTextConf" method?


回答1:


I wrote several generic OCR blog posts on the image pre-processing and "how OCR works best" some time ago. Please find them here: http://www.ocr-it.com/user-scenario-process-digital-camera-pictures-and-ocr-to-extract-specific-numbers

In general, getting high enough resolution should be the first step. Low resolution simply does not have enough information per letter to read characters reliably. Then I do adaptive binarization, where the image is converted to black & white using threshold where backgrounds should dispensary and characters should remain pretty clear, without extra noise or holes in them. Then, optionally, can perform segmentation into various fields and process each field separately with specific settings, such as "digits only" for the number, and "M|F" for sexe field, etc.



来源:https://stackoverflow.com/questions/25668203/tesseract-improvements-and-image-pre-processing-steps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!