Improve horizontal line detection in .pdf image with OpenCV

被刻印的时光 ゝ 提交于 2019-12-07 11:17:15

问题


I have .pdf files that have been converted to .jpg images for this project. My goal is to identify the blanks (e.g ____________) that you would generally find in a .pdf form that indicate a space for the user to sign of fill out some kind of information. I have been using edge detection with the cv2.Canny() and cv2.HoughlinesP() functions.

This works fairly well, but there are quite a few false positives that come about from seemingly nowhere. When I look at the 'edges' file it shows a bunch of noise around the other words. I'm uncertain where this noise comes from.

Should I continue to tweak the parameters, or is there a better method to find the location of these blanks?


回答1:


Assuming that you're trying to find horizontal lines on a .pdf form, here's a simple approach:

  • Convert image to grayscale and adaptive threshold image
  • Construct special kernel to detect only horizontal lines
  • Perform morphological transformations
  • Find contours and draw onto image

Using this example image

Convert to grayscale and adaptive threshold to obtain a binary image

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Then we create a kernel with cv2.getStructuringElement() and perform morphological transformations to isolate horizontal lines

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

From here we can use cv2.HoughLinesP() to detect lines but since we have already preprocessed the image and isolated the horizontal lines, we can just find contours and draw the result

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (36,255,12), 3)

Full code

import cv2

image = cv2.imread('2.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (36,255,12), 3)

cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()


来源:https://stackoverflow.com/questions/57260893/improve-horizontal-line-detection-in-pdf-image-with-opencv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!