What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?

后端未结

关注

 2  1327

Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.S


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-31 17:57
              
            
            
                                                                       
Here are the definitions of the Example and SequenceExample protocol buffers, and all the protos they may contain:

message BytesList { repeated bytes value = 1; }
message FloatList { repeated float value = 1 [packed = true]; }
message Int64List { repeated int64 value = 1 [packed = true]; }
message Feature {
    oneof kind {
        BytesList bytes_list = 1;
        FloatList float_list = 2;
        Int64List int64_list = 3;
    }
};
message Features { map<string, Feature> feature = 1; };
message Example { Features features = 1; };

message FeatureList { repeated Feature feature = 1; };
message FeatureLists { map<string, FeatureList> feature_list = 1; };
message SequenceExample {
  Features context = 1;
  FeatureLists feature_lists = 2;
};


An Example contains a Features, which contains a mapping from feature name to Feature, which contains either a bytes list, or a float list or an int64 list.

A SequenceExample also contains a Features, but it also contains a FeatureLists, which contains a mapping from list name to FeatureList, which contains a list of Feature. So it can do everything an Example can do, and more. But do you really need that extra functionality? What does it do?

Since each Feature contains a list of values, a FeatureList is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample.

For example, if you handle text, you can represent it as one big string:

from tensorflow.train import BytesList

BytesList(value=[b"This is the first sentence. And here's another."])


Or you could represent it as a list of words and tokens:

BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
                 b"'s", b"another", b"."])


Or you could represent each sentence separately. That's where you would need a list of lists:

from tensorflow.train import BytesList, Feature, FeatureList

s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])


Then create the SequenceExample:

from tensorflow.train import SequenceExample, FeatureLists

seq = SequenceExample(feature_lists=FeatureLists(feature_list={
    "sentences": fl
}))


And you can serialize it and perhaps save it to a TFRecord file.

data = seq.SerializeToString()


Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example().
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-12-31 18:12
              
            
            
                                                                       
The link you provided lists some benefits. You can see how parse_single_sequence_example is used here  https://github.com/tensorflow/magenta/blob/master/magenta/common/sequence_example_lib.py

If you managed to get the data into your model with Example, it should be fine. SequenceExample just gives a little more structure to your data and some utilities for working with it.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复