Best practise for video ground truthing?
问题 I would like to train a deep learning framework (TensorFlow) for object detection with a new object category. As source for the ground truthing I have multiple video files which contain the object (only part of the image contains the object). How should I ground truth the video? Should I extract frame by frame and label every frame even when those video frames will be quite similar? Or what would be best practise for such a task? Open source tools are preferred. 回答1: It usually works as you