问题
I have .pdf
files that have been converted to .jpg
images for this project. My goal is to identify the blanks (e.g ____________) that you would generally find in a .pdf
form that indicate a space for the user to sign of fill out some kind of information. I have been using edge detection with the cv2.Canny()
and cv2.HoughlinesP()
functions.
This works fairly well, but there are quite a few false positives that come about from seemingly nowhere. When I look at the 'edges' file it shows a bunch of noise around the other words. I'm uncertain where this noise comes from.
Should I continue to tweak the parameters, or is there a better method to find the location of these blanks?
回答1:


Assuming that you're trying to find horizontal lines on a .pdf
form, here's a simple approach:
- Convert image to grayscale and adaptive threshold image
- Construct special kernel to detect only horizontal lines
- Perform morphological transformations
- Find contours and draw onto image
Using this example image

Convert to grayscale and adaptive threshold to obtain a binary image
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

Then we create a kernel with cv2.getStructuringElement()
and perform morphological transformations to isolate horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

From here we can use cv2.HoughLinesP()
to detect lines but since we have already preprocessed the image and isolated the horizontal lines, we can just find contours and draw the result
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (36,255,12), 3)

Full code
import cv2
image = cv2.imread('2.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(image, [c], -1, (36,255,12), 3)
cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()
来源:https://stackoverflow.com/questions/57260893/improve-horizontal-line-detection-in-pdf-image-with-opencv