image processing to improve tesseract OCR accuracy

前端 未结 13 1942
鱼传尺愫
鱼传尺愫 2020-11-22 14:41

I\'ve been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I\'m looking for tips on what sort of image processing might impr

13条回答
  •  感动是毒
    2020-11-22 15:21

    I am by no means an OCR expert. But I this week had need to convert text out of a jpg.

    I started with a colorized, RGB 445x747 pixel jpg. I immediately tried tesseract on this, and the program converted almost nothing. I then went into GIMP and did the following. image>mode>grayscale image>scale image>1191x2000 pixels filters>enhance>unsharp mask with values of radius = 6.8, amount = 2.69, threshold = 0 I then saved as a new jpg at 100% quality.

    Tesseract then was able to extract all the text into a .txt file

    Gimp is your friend.

提交回复
热议问题