Python: Find Amount of Handwriting in Video

前端 未结 4 516
走了就别回头了
走了就别回头了 2021-02-03 13:38

Do you know of an algorithm that can see that there is handwriting on an image? I am not interested in knowing what the handwriting says, but only that there is

4条回答
  •  我在风中等你
    2021-02-03 14:20

    You could try to make a template before detection which you could use to deduct it on the current frame of the video. One way you could make such a template is to iterate through every pixel of the frame and look-up if it has a higher value (white) in that coordinate than the value that is stored in the list.

    Here is an example of such a template from your video by iterating through the first two seconds:

    Once you have that it is simple to detect the text. You can use the cv2.absdiff() function to make difference of template and frame. Here is an example:

    Once you have this image it is trivial to search for writting (threshold + contour search or something similar).

    Here is an example code:

    import numpy as np
    import cv2
    
    cap = cv2.VideoCapture('0_0.mp4')  # read video
    
    bgr = cap.read()[1]  # get first frame
    frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)  # transform to grayscale
    template = frame.copy()  # make a copy of the grayscale
    
    h, w = frame.shape[:2]  # height, width
    
    matrix = []  # a list for [y, x] coordinares
    # fill matrix with all coordinates of the image (height x width)
    for j in range(h):
        for i in range(w):
            matrix.append([j, i])
    
    fps = cap.get(cv2.CAP_PROP_FPS)  # frames per second of the video
    seconds = 2  # How many seconds of the video you wish to look the template for
    k = seconds * fps  # calculate how many frames of the video is in that many seconds
    i = 0  # some iterator to count the frames
    lowest = []  # list that will store highest values of each pixel on the fram - that will build our template
    
    # store the value of the first frame - just so you can compare it in the next step
    for j in matrix:
        y = j[0]
        x = j[1]
        lowest.append(template[y, x])
    
    # loop through the number of frames calculated before
    while(i < k):
        bgr = cap.read()[1]  # bgr image
        frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)  # transform to grayscale
        # iterate through every pixel (pixels are located in the matrix)
        for l, j in enumerate(matrix):
            y = j[0]  # x coordinate
            x = j[1]  # y coordinate
            temp = template[y, x]  # value of pixel in template
            cur = frame[y, x]  # value of pixel in the current frame
            if cur > temp:  # if the current frame has higher value change the value in the "lowest" list
                lowest[l] = cur
        i += 1  # increment the iterator
    
        # just for vizualization
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    i = 0  # new iteratir to increment position in the "lowest" list
    template = np.ones((h, w), dtype=np.uint8)*255  #  new empty white image
    # iterate through the matrix and change the value of the new empty white image to that value
    # in the "lowest" list
    for j in matrix:
        template[j[0], j[1]] = lowest[i]
        i += 1
    
    # just for visualization - template
    cv2.imwrite("template.png", template)
    cv2.imshow("template", template)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    
    counter = 0  # counter of countours: logicaly if the number of countours would
    # rapidly decrease than that means that a new template is in order
    mean_compare = 0  # this is needed for a simple color checker if the contour is
    # the same color as the oders
    # this is the difference between the frame of the video and created template
    while(cap.isOpened()):
        bgr = cap.read()[1]  # bgr image
        frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)  # grayscale
        img = cv2.absdiff(template, frame)  # resulted difference
        thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]  # thresholded image
        kernel = np.ones((5, 5), dtype=np.uint8)  # simple kernel
        thresh = cv2.dilate(thresh, kernel, iterations=1)  # dilate thresholded image
        cnts, h = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)  # contour search
        if len(cnts) < counter*0.5 and counter > 50:  # check if new template is in order
            # search for new template again
            break
        else:
            counter = len(cnts) # update counter
            for cnt in cnts:  # iterate through contours
                size = cv2.contourArea(cnt)  # size of contours - to filter out noise
                if 20 < size < 30000:  # noise criterion
                    mask = np.zeros(frame.shape, np.uint8)  # empry mask - needed for color compare
                    cv2.drawContours(mask, [cnt], -1, 255, -1)  # draw contour on mask
                    mean = cv2.mean(bgr, mask=mask)  # the mean color of the contour
    
                    if not mean_compare:  # first will set the template color
                        mean_compare = mean
                    else:
                        k1 = 0.85  # koeficient how much each channels value in rgb image can be smaller
                        k2 = 1.15 # koeficient how much each channels value in rgb image can be bigger
                        # condition
                        b = bool(mean_compare[0] * k1 < mean[0] < mean_compare[0] * k2)
                        g = bool(mean_compare[1] * k1 < mean[1] < mean_compare[1] * k2)
                        r = bool(mean_compare[2] * k1 < mean[2] < mean_compare[2] * k2)
                        if b and g and r:
                            cv2.drawContours(bgr, [cnt], -1, (0, 255, 0), 2)  # draw on rgb image
    
        # just for visualization
        cv2.imshow('img', bgr)
        if cv2.waitKey(1) & 0xFF == ord('s'):
            cv2.imwrite(str(j)+".png", img)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    # release the video object and destroy window
    cap.release()
    cv2.destroyAllWindows()
    

    One possible result with a simple size and color filter:

    NOTE: This template search algorithm is very slow because of the nested loops and can probably be optimized to make it faster - you need a little more math knowledge than me. Also, you will need to make a check if the template changes in the same video - I'm guessing that shouldn't be too difficult.

    A simpler idea on how to make it a bit faster is to resize the frames to let's say 20% and make the same template search. After that resize it back to the original and dilate the template. It will not be as nice of a result but it will make a mask on where the text and lines of the template are. Then simply draw it over the frame.

提交回复
热议问题