How should I use sql_last_value in logstash?

前端未结

关注

 3  444

终归单人心 2020-12-16 05:46

I\'m quite unclear of what sql_last_value does when I give my statement as such:

statement => \"SELECT * from mytable where id > :sql_last


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   情歌与酒
                                             
                
                
                (楼主)
            
              
              
                2020-12-16 06:15
              

            
            
                        
In simple words, sql_last_value allows you to persist data from your last sql run as its name sugets.
This value is specially useful when you schedule your query. But why ... ?
Because you can create your sql statement condition based on the value stored in sql_last_value and avoid to retrieve rows that were already ingested for your logstash input or updated after last pipeline execution.
Things to keep in mind when using sql_last_value 

By default, this variable stores a timestamp of last run. Useful when you need to ingest data based in columns like creation_date last_update etc..
You can define the value of sql_last_value by tracking it with a specific table's column value. Useful when you need to ingest auto increment data based. For that, you need to specify use_column_value => true and tracking_column => "column_name_to_track".

The following example will store the last mytable row's id into :sql_last_value to ingest in the next execution the rows that were not ingested previously, it means the rows which its id is greater than the last ingested id.
input {
    jdbc {
        # ...
        schedule => "* * * * *"
        statement => "SELECT * from mytable where id > :sql_last_value"
        use_column_value => true
        tracking_column => id
    }
}


Extremely important !!!
When you use multiple inputs in your pipeline, each input block will overwrite the value of sql_last_value of the last one. For avoiding that behaviour, you can use last_run_metadata_path => "/path/to/sql_last_value/of_your_pipeline.yml" option, which means that each pipepline will stores its own value in a different file.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复