Do you know of an algorithm that can see that there is handwriting on an image? I am not interested in knowing what the handwriting says, but only that there is
You could try to make a template before detection which you could use to deduct it on the current frame of the video. One way you could make such a template is to iterate through every pixel of the frame and look-up if it has a higher value (white) in that coordinate than the value that is stored in the list.
Here is an example of such a template from your video by iterating through the first two seconds:
Once you have that it is simple to detect the text. You can use the cv2.absdiff()
function to make difference of template and frame. Here is an example:
Once you have this image it is trivial to search for writting (threshold + contour search or something similar).
Here is an example code:
import numpy as np
import cv2
cap = cv2.VideoCapture('0_0.mp4') # read video
bgr = cap.read()[1] # get first frame
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # transform to grayscale
template = frame.copy() # make a copy of the grayscale
h, w = frame.shape[:2] # height, width
matrix = [] # a list for [y, x] coordinares
# fill matrix with all coordinates of the image (height x width)
for j in range(h):
for i in range(w):
matrix.append([j, i])
fps = cap.get(cv2.CAP_PROP_FPS) # frames per second of the video
seconds = 2 # How many seconds of the video you wish to look the template for
k = seconds * fps # calculate how many frames of the video is in that many seconds
i = 0 # some iterator to count the frames
lowest = [] # list that will store highest values of each pixel on the fram - that will build our template
# store the value of the first frame - just so you can compare it in the next step
for j in matrix:
y = j[0]
x = j[1]
lowest.append(template[y, x])
# loop through the number of frames calculated before
while(i < k):
bgr = cap.read()[1] # bgr image
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # transform to grayscale
# iterate through every pixel (pixels are located in the matrix)
for l, j in enumerate(matrix):
y = j[0] # x coordinate
x = j[1] # y coordinate
temp = template[y, x] # value of pixel in template
cur = frame[y, x] # value of pixel in the current frame
if cur > temp: # if the current frame has higher value change the value in the "lowest" list
lowest[l] = cur
i += 1 # increment the iterator
# just for vizualization
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
i = 0 # new iteratir to increment position in the "lowest" list
template = np.ones((h, w), dtype=np.uint8)*255 # new empty white image
# iterate through the matrix and change the value of the new empty white image to that value
# in the "lowest" list
for j in matrix:
template[j[0], j[1]] = lowest[i]
i += 1
# just for visualization - template
cv2.imwrite("template.png", template)
cv2.imshow("template", template)
cv2.waitKey(0)
cv2.destroyAllWindows()
counter = 0 # counter of countours: logicaly if the number of countours would
# rapidly decrease than that means that a new template is in order
mean_compare = 0 # this is needed for a simple color checker if the contour is
# the same color as the oders
# this is the difference between the frame of the video and created template
while(cap.isOpened()):
bgr = cap.read()[1] # bgr image
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # grayscale
img = cv2.absdiff(template, frame) # resulted difference
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] # thresholded image
kernel = np.ones((5, 5), dtype=np.uint8) # simple kernel
thresh = cv2.dilate(thresh, kernel, iterations=1) # dilate thresholded image
cnts, h = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # contour search
if len(cnts) < counter*0.5 and counter > 50: # check if new template is in order
# search for new template again
break
else:
counter = len(cnts) # update counter
for cnt in cnts: # iterate through contours
size = cv2.contourArea(cnt) # size of contours - to filter out noise
if 20 < size < 30000: # noise criterion
mask = np.zeros(frame.shape, np.uint8) # empry mask - needed for color compare
cv2.drawContours(mask, [cnt], -1, 255, -1) # draw contour on mask
mean = cv2.mean(bgr, mask=mask) # the mean color of the contour
if not mean_compare: # first will set the template color
mean_compare = mean
else:
k1 = 0.85 # koeficient how much each channels value in rgb image can be smaller
k2 = 1.15 # koeficient how much each channels value in rgb image can be bigger
# condition
b = bool(mean_compare[0] * k1 < mean[0] < mean_compare[0] * k2)
g = bool(mean_compare[1] * k1 < mean[1] < mean_compare[1] * k2)
r = bool(mean_compare[2] * k1 < mean[2] < mean_compare[2] * k2)
if b and g and r:
cv2.drawContours(bgr, [cnt], -1, (0, 255, 0), 2) # draw on rgb image
# just for visualization
cv2.imshow('img', bgr)
if cv2.waitKey(1) & 0xFF == ord('s'):
cv2.imwrite(str(j)+".png", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# release the video object and destroy window
cap.release()
cv2.destroyAllWindows()
One possible result with a simple size and color filter:
NOTE: This template search algorithm is very slow because of the nested loops and can probably be optimized to make it faster - you need a little more math knowledge than me. Also, you will need to make a check if the template changes in the same video - I'm guessing that shouldn't be too difficult.
A simpler idea on how to make it a bit faster is to resize the frames to let's say 20% and make the same template search. After that resize it back to the original and dilate the template. It will not be as nice of a result but it will make a mask on where the text and lines of the template are. Then simply draw it over the frame.