What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?

后端 未结 2 1327
我在风中等你
我在风中等你 2020-12-31 17:09

Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.S

相关标签:
2条回答
  • 2020-12-31 17:57

    Here are the definitions of the Example and SequenceExample protocol buffers, and all the protos they may contain:

    message BytesList { repeated bytes value = 1; }
    message FloatList { repeated float value = 1 [packed = true]; }
    message Int64List { repeated int64 value = 1 [packed = true]; }
    message Feature {
        oneof kind {
            BytesList bytes_list = 1;
            FloatList float_list = 2;
            Int64List int64_list = 3;
        }
    };
    message Features { map<string, Feature> feature = 1; };
    message Example { Features features = 1; };
    
    message FeatureList { repeated Feature feature = 1; };
    message FeatureLists { map<string, FeatureList> feature_list = 1; };
    message SequenceExample {
      Features context = 1;
      FeatureLists feature_lists = 2;
    };
    

    An Example contains a Features, which contains a mapping from feature name to Feature, which contains either a bytes list, or a float list or an int64 list.

    A SequenceExample also contains a Features, but it also contains a FeatureLists, which contains a mapping from list name to FeatureList, which contains a list of Feature. So it can do everything an Example can do, and more. But do you really need that extra functionality? What does it do?

    Since each Feature contains a list of values, a FeatureList is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample.

    For example, if you handle text, you can represent it as one big string:

    from tensorflow.train import BytesList
    
    BytesList(value=[b"This is the first sentence. And here's another."])
    

    Or you could represent it as a list of words and tokens:

    BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
                     b"'s", b"another", b"."])
    

    Or you could represent each sentence separately. That's where you would need a list of lists:

    from tensorflow.train import BytesList, Feature, FeatureList
    
    s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
    s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
    fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])
    

    Then create the SequenceExample:

    from tensorflow.train import SequenceExample, FeatureLists
    
    seq = SequenceExample(feature_lists=FeatureLists(feature_list={
        "sentences": fl
    }))
    

    And you can serialize it and perhaps save it to a TFRecord file.

    data = seq.SerializeToString()
    

    Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example().

    0 讨论(0)
  • 2020-12-31 18:12

    The link you provided lists some benefits. You can see how parse_single_sequence_example is used here https://github.com/tensorflow/magenta/blob/master/magenta/common/sequence_example_lib.py

    If you managed to get the data into your model with Example, it should be fine. SequenceExample just gives a little more structure to your data and some utilities for working with it.

    0 讨论(0)
提交回复
热议问题