Best practise for video ground truthing?

问题

I would like to train a deep learning framework (TensorFlow) for object detection with a new object category.

As source for the ground truthing I have multiple video files which contain the object (only part of the image contains the object).

How should I ground truth the video? Should I extract frame by frame and label every frame even when those video frames will be quite similar? Or what would be best practise for such a task?

Open source tools are preferred.

回答1:

It usually works as you described. At lest for the iteration zero:

collect required examples (video)
extract valuable frames from the video (manual or partially automated process)
use OpenCV (or any other tool) to extract required details (bounding box, accurate mask)
assemble a training set
train a model

Here is an example of a training set, produced by the approach described above (see it in action)

For iteration one you might use iteration zero models and significantly improve step 2 and step 3 to increase the training set even more.

I'm trying to solve pretty much the same problem, because it is hard to produce a training set to get accurate segmentation:

(again, here it is in action and other examples)

Basically, start with a semi-manual approach and try to evolve.

来源：https://stackoverflow.com/questions/58910721/best-practise-for-video-ground-truthing

标签

tensorflow

image-processing

deep-learning

object-detection

video-processing

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!