Detect number of rows and columns in table image with OpenCV

社会主义新天地 提交于 2021-02-18 17:07:13

问题


How can we get the number of rows and columns in an Image table via Opencv.

Code for getting boxes in table which I am getting right

contours, hierarchy = cv2.findContours(img_final_bin, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
reverse = False
i = 0
# handle if we need to sort in reverse
if method == "right-to-left" or method == "bottom-to-top":
    reverse = True
# handle if we are sorting against the y-coordinate rather than
# the x-coordinate of the bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
    i = 1
# construct the list of bounding boxes and sort them from top to
# bottom
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
    key=lambda b:b[1][i], reverse=reverse))
# return the list of sorted contours and bounding boxes
return (cnts, boundingBoxes)

(contours, boundingBoxes) = sort_contours(contours, method="top-to-bottom")


回答1:


Here's a potential approach:

  1. Obtain binary image. Load image, convert to grayscale, Gaussian blur, then Otsu's threshold.

  2. Remove text inside cells. Find contours and filter using cv2.contourArea to remove text by filling in the contours with cv2.drawContours

  3. Invert image. We invert the image so the cells are in white and the background in black

  4. Sort cells and sum rows/columns. We find contours then sort the contours from top-to-bottom using imutils.contours.sort_contours. Next we iterate through contours and find the centroid to obtain the (cX, cY) coordinates. The idea is that we can compare the cY value of each cell to determine if it is a new row or a cell in the same row by using a offset. A cell should be in the same row if the cY value is +/- some offset value. If it is greater then it means the cell is in a new row. We build a model table where the length of the table gives you the rows while the length of any index gives you the number of columns.


Binary image

Removed text contours + inverted image

Here's a visualization of iterating through each cell to count the number of rows and columns

Result

Rows: 7
Columns: 4

Code

import numpy as np
from imutils import contours
import cv2

# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find contours and remove text inside cells
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 4000:
        cv2.drawContours(thresh, [c], -1, 0, -1)

# Invert image
invert = 255 - thresh
offset, old_cY, first = 10, 0, True
visualize = cv2.cvtColor(invert, cv2.COLOR_GRAY2BGR)

# Find contours, sort from top-to-bottom and then sum up column/rows
cnts = cv2.findContours(invert, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
for c in cnts:
    # Find centroid
    M = cv2.moments(c)
    cX = int(M["m10"] / M["m00"])
    cY = int(M["m01"] / M["m00"])

    # New row
    if (abs(cY) - abs(old_cY)) > offset:
        if first:
            row, table = [], []
            first = False
        old_cY = cY
        table.append(row)
        row = []

    # Cell in same row
    if ((abs(cY) - abs(old_cY)) <= offset) or first:
        row.append(1)

    # Uncomment to visualize 
    '''
    cv2.circle(visualize, (cX, cY), 10, (36, 255, 12), -1) 
    cv2.imshow('visualize', visualize)
    cv2.waitKey(200)
    '''

print('Rows: {}'.format(len(table)))
print('Columns: {}'.format(len(table[1])))

cv2.imshow('invert', invert)
cv2.imshow('thresh', thresh)
cv2.waitKey()



回答2:


Seems like an easy solution would be to first look left to right and check if each pixel is black (which would indicate that we have found a column. Then do the same for rows (if from top to bottom each pixel is black, it means it has found a row).

One complication is the width of the line which means you will count it as only 1 row/column found until it finds white color.

I could work out the code for this but I'm not at home right now, so maybe someone else can write the code and I will delete my answer later. I know this could be a comment but I don't have 50 reputation.




回答3:


One other approach is first to verify if its a real table or not, For that hough's line transform can be used, once it is done you can use the approach explained above by fellow.



来源:https://stackoverflow.com/questions/60396925/detect-number-of-rows-and-columns-in-table-image-with-opencv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!