TensorFlow - Read video frames from TFRecords file

后端 未结 2 1079

TLDR; my question is on how to load compressed video frames from TFRecords.

I am setting up a data pipeline for training deep learning mode

相关标签:
2条回答
  • 2020-12-15 11:12

    Since you're using very similar dependencies, I suggest to take a look at the following Python package as it addresses your exact problem setting:

    pip install video2tfrecord
    

    or refer to https://github.com/ferreirafabio/video2tfrecord. It should also be adaptable enough to use tf.data.Dataset.

    disclaimer: I am one of the authors of the package.

    0 讨论(0)
  • 2020-12-15 11:29

    Encoding each frame as a separate feature makes it difficult to select frames dynamically, because the signature of tf.parse_example() (and tf.parse_single_example()) requires that the set of parsed feature names be fixed at graph construction time. However, you could try encoding the frames as a single feature that contains a list of JPEG-encoded strings:

    def _bytes_list_feature(values):
        """Wrapper for inserting bytes features into Example proto."""
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=values))
    
    with tf.python_io.TFRecordWriter(output_file) as writer:
    
      # Read and resize all video frames, np.uint8 of size [N,H,W,3]
      frames = ... 
    
      features = {}
      features['num_frames']  = _int64_feature(frames.shape[0])
      features['height']      = _int64_feature(frames.shape[1])
      features['width']       = _int64_feature(frames.shape[2])
      features['channels']    = _int64_feature(frames.shape[3])
      features['class_label'] = _int64_feature(example['class_id'])
      features['class_text']  = _bytes_feature(tf.compat.as_bytes(example['class_label']))
      features['filename']    = _bytes_feature(tf.compat.as_bytes(example['video_id']))
    
      # Compress the frames using JPG and store in as a list of strings in 'frames'
      encoded_frames = [tf.compat.as_bytes(cv2.imencode(".jpg", frame)[1].tobytes())
                        for frame in frames]
      features['frames'] = _bytes_list_feature(encoded_frames)
    
      tfrecord_example = tf.train.Example(features=tf.train.Features(feature=features))
      writer.write(tfrecord_example.SerializeToString())
    

    Once you have done this, it will be possible to slice the frames feature dynamically, using a modified version of your parsing code:

    def decode(serialized_example, sess):
      # Prepare feature list; read encoded JPG images as bytes
      features = dict()
      features["class_label"] = tf.FixedLenFeature((), tf.int64)
      features["frames"] = tf.VarLenFeature(tf.string)
      features["num_frames"] = tf.FixedLenFeature((), tf.int64)
    
      # Parse into tensors
      parsed_features = tf.parse_single_example(serialized_example, features)
    
      # Randomly sample offset from the valid range.
      random_offset = tf.random_uniform(
          shape=(), minval=0,
          maxval=parsed_features["num_frames"] - SEQ_NUM_FRAMES, dtype=tf.int64)
    
      offsets = tf.range(random_offset, random_offset + SEQ_NUM_FRAMES)
    
      # Decode the encoded JPG images
      images = tf.map_fn(lambda i: tf.image.decode_jpeg(parsed_features["frames"].values[i]),
                         offsets)
    
      label  = tf.cast(parsed_features["class_label"], tf.int64)
    
      return images, label
    

    (Note that I haven't been able to run your code, so there may be some small errors, but hopefully it is enough to get you started.)

    0 讨论(0)
提交回复
热议问题