Best practise for video ground truthing?

左心房为你撑大大i 提交于 2020-02-06 06:24:08

问题


I would like to train a deep learning framework (TensorFlow) for object detection with a new object category.

As source for the ground truthing I have multiple video files which contain the object (only part of the image contains the object).

How should I ground truth the video? Should I extract frame by frame and label every frame even when those video frames will be quite similar? Or what would be best practise for such a task?

Open source tools are preferred.


回答1:


It usually works as you described. At lest for the iteration zero:

  1. collect required examples (video)
  2. extract valuable frames from the video (manual or partially automated process)
  3. use OpenCV (or any other tool) to extract required details (bounding box, accurate mask)
  4. assemble a training set
  5. train a model

Here is an example of a training set, produced by the approach described above (see it in action)

For iteration one you might use iteration zero models and significantly improve step 2 and step 3 to increase the training set even more.

I'm trying to solve pretty much the same problem, because it is hard to produce a training set to get accurate segmentation:

(again, here it is in action and other examples)

Basically, start with a semi-manual approach and try to evolve.



来源:https://stackoverflow.com/questions/58910721/best-practise-for-video-ground-truthing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!