Text Extraction from Notebook

心不动则不痛 提交于 2020-01-03 05:24:06

问题


I am trying to extract handwritten text from images. I use python with opencv functions such us find_contours. It was all going pretty well when I used images like this one:

It works fine because I have a plain background. But then I tested it with this image:

Because of the notebook's lines in the background, I am not able to extract the text only. Although the text is red, I turn all images to grayscale or sometimes threshold so it all turns black just like the notebook lines. That way the colour of the text does not matter. So my question here is: could anyone please give me advice or a possible solution on how to deal with this kind of background in order to extract the text. I really don't want to use the sliding window method. Thank you in advance


回答1:


I decided to try again with the HoughLinesP functionality in OpenCV which this time gave me a much more promising and satisfying result. Here's a snippet for the code I used to remove most of the lines:

import cv2
import numpy

img = cv2.imread('thresh.png')
edges = cv2.Canny(img, 50, 150, apertureSize=3)
minLineLength = 0
maxLineGap = 5
lines = cv2.HoughLinesP(edges, 1, numpy.pi / 180, 100, minLineLength, maxLineGap)

for x in range(len(lines)):
    for x1, y1, x2, y2 in lines[x]:
        cv2.line(img, (x1, y1), (x2, y2), (0, 0, 0), 2)

cv2.imwrite('houghlines3.jpg', img)

Additional Info: thresh.png is the image in which I store the threshold version of the initial pic. The way this whole thing works is that it finds the lines in the image and paints them black(because in my threshold what is close to white becomes black and vice-versa). That's how it clears the lines.

PS: Hope I helped somebody! Cheers!



来源:https://stackoverflow.com/questions/41362489/text-extraction-from-notebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!