问题
I have a video of a slideshow, where the presenter handwrites notes onto the slide:
I would like to create a program that detects if a slide is being filled (by handwritten notes for example) or if it is a new slide.
One method I was thinking of is OCR of the text, but this is not suitable since here the only text that changes are either handwritten or math.
What I have done so far: I go through the video and compare always the previous frame and the current frame. I extract the bounding box coordinates from all elements that have been added with respect to the previous frame, and I store the highest y-coordinate. The highest y-coordinate belongs to the element the furthest down the image (as seen from the top of the image). Thus this should -in theory- give me an indication if I am filling up the slide...
In practice, I cannot really make use of this data:
The video in question can be downloaded here: http://www.filedropper.com/00_6
Here is my code:
from skimage.measure import compare_ssim
import cv2
import numpy as np
# Packages for live plot visualisation
import pyqtgraph as pg
from pyqtgraph.Qt import QtGui, QtCore
from tqdm import tqdm
def get_y_corrd_of_lowest_added_element(prev_frame, frame):
"""
Given Two Images it detects the bounding boxes of all elemnts that
are different betweent the two images and outputs the y coordinate of
the lowest added element (when seen from the top of the image)
Parameters
----------
prev_frame : numpy array
original image.
frame : numpy array
new image, based on original image.
Returns
-------
TYPE
lowest y coordinate of elments that were added.
"""
# Compute SSIM between two images
(score, diff) = compare_ssim(prev_frame, frame, full=True)
# The diff image contains the actual image differences between the two images
# and is represented as a floating point data type in the range [0,1]
# so we must convert the array to 8-bit unsigned integers in the range
# [0,255] before we can use it with OpenCV
diff = (diff * 255).astype("uint8")
# Threshold the difference image, followed by finding contours to
# obtain the regions of the two input images that differ
thresh = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
contours = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
# Initialize a list that will hold all y coordinates of all bounding boxes
# of all elements that were added to the frame when compared to the
# previous frame
y_list = [0]
for c in contours:
area = cv2.contourArea(c)
if area > 40:
x,y,w,h = cv2.boundingRect(c)
# Append to y coordinate list
y_list.append(y)
y_list.sort()
return y_list[-1]
def transform(frame):
# convert to greyscale
frame = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
# make smaller
small = cv2.resize(frame, (0,0), fx=0.5, fy=0.5)
return small
vidcap = cv2.VideoCapture(ADD PATH TO VIDEO HERE)
success,prev_frame = vidcap.read()
prev_frame = transform(prev_frame)
# For Real Time Ploting
#Source: http://www.pyqtgraph.org/downloads/0.10.0/pyqtgraph-0.10.0-deb/pyqtgraph-0.10.0/examples/PlotSpeedTest.py
app = QtGui.QApplication([])
win = pg.GraphicsWindow()
win.resize(800, 800)
p = win.addPlot()
p.setTitle('Lowest Y')
plot = p.plot([])
# Store lowest y coordinates of added elements
y_lowest_list = []
while success:
success,frame = vidcap.read()
# convert
frame = transform(frame)
# show frame
cv2.imshow("frame", frame)
cv2.waitKey(1)
#extract lowest y corrd
y = get_y_corrd_of_lowest_added_element(prev_frame, frame)
y_lowest_list.append(y)
# Real-time plot
plot.setData(y_lowest_list)
# close real-time plot
win.close()
Does anyone have an idea?
回答1:
You can try this code, see comments:
import cv2
import numpy as np
def get_bg_and_ink_level(frame):
frame = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
background=cv2.threshold(frame[:,:,2], 245, 255, cv2.THRESH_BINARY)[1]
background_level=cv2.mean(background) # for future use if you need to select frames without hands.
ink_color_low = (117,60,150)
ink_color_high = (130,207,225)
only_ink = cv2.inRange(frame, ink_color_low, ink_color_high)
ink_level=cv2.mean(only_ink)
return background_level[0], ink_level[0]
vidcap = cv2.VideoCapture('0_0.mp4')
success,frame = vidcap.read()
bg = []
ink=[]
i=0
while success:
lv= get_bg_and_ink_level(frame)
bg.append(lv[0])
ink.append(lv[1])
success,frame = vidcap.read()
# search for frames where the blue ink is removed from the picture.
d_ink=np.diff(ink)
d_ink[-1]=-2.0 #add last frame
idx=np.where(d_ink<-1.0)
#save frames
for i in idx[0]:
vidcap.set(cv2.CAP_PROP_POS_FRAMES, i)
flag, frame = vidcap.read()
out_name='frame'+str(i)+'.jpg'
cv2.imwrite(out_name, frame)
Result 15708 frame:
回答2:
As a first pass at the problem, I'd probably want to just count the number of pixels that are different between the two images. It has several desirable properties:
- It's an actual distance metric.
- It's dirt-cheap computationally.
- Slides with more handwriting are farther from the original than slides with little writing (e.g. if you progressively added more writing and wanted to order those).
- If there's even a moderate amount of content on the slides, you'll plausibly (not necessarily) have any two unrelated slides be farther from each other than two slides which are the same but just differ in the handwriting (especially with thin writing like that).
It's not a perfect solution of course -- e.g., if you acquire the slides by taking photos then almost every slide will differ at every pixel. Take a moment to think about it with respect to your use case and data collection methods.
It's pretty common for images in python to be represented as numpy arrays. Supposing that's the case for you as well, the following example would compute the metric in question (or could be readily modified to give you similarity rather than distance):
def dist(a, b):
# Supposes some sort of pixel representation like bgr or hsl with
# shape (w, h, other) or (h, w, other)
return np.sum(np.sum(a!=b, axis=-1)!=0)
来源:https://stackoverflow.com/questions/63624333/quantify-how-much-a-slide-has-been-filled-with-handwriting