Tesseract not picking up different colored text

你离开我真会死。 提交于 2019-12-08 11:17:20

问题


I am trying to make a program that will scrape the text off of a screenshot using tesseract and python, and am having no issue getting one piece of it, however some text is lighter colored and is not being picked up by tesseract. Below is an example of a picture I am using:

I am am to get the text at the top of the picture, but not the 3 options below.

Here is the code I am using for grabbing the text

result = pytesseract.image_to_string(
            screen, config="load_system_dawg=0 load_freq_dawg=0")

        print("below is the total value scraped by the tesseract")
        print(result)

        # Split up newlines until we have our question and answers
        parts = result.split("\n\n")

        question = parts.pop(0).replace("\n", " ")
        q_terms = question.split(" ")
        q_terms = list(filter(lambda t: t not in stop, q_terms))
        q_terms = set(q_terms)

        parts = "\n".join(parts)
        parts = parts.split("\n")

        answers = list(filter(lambda p: len(p) > 0, parts))

I when I have plain text in black without a colored background I can get the answers array to be populated by the 3 below options, however not in this case. Is there any way I can go about fixing this?


回答1:


You're missing binarization, or thresholding step.

In your case you can simply apply binary threshold on grayscale image.

Here is result image with threshold = 177

Here1 you can learn more about Thresholding with opencv python library



来源:https://stackoverflow.com/questions/48530331/tesseract-not-picking-up-different-colored-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!