How should I use sql_last_value in logstash?

前端 未结 3 444
终归单人心
终归单人心 2020-12-16 05:46

I\'m quite unclear of what sql_last_value does when I give my statement as such:

statement => \"SELECT * from mytable where id > :sql_last         


        
3条回答
  •  情歌与酒
    2020-12-16 06:15

    In simple words, sql_last_value allows you to persist data from your last sql run as its name sugets.

    This value is specially useful when you schedule your query. But why ... ? Because you can create your sql statement condition based on the value stored in sql_last_value and avoid to retrieve rows that were already ingested for your logstash input or updated after last pipeline execution.

    Things to keep in mind when using sql_last_value

    • By default, this variable stores a timestamp of last run. Useful when you need to ingest data based in columns like creation_date last_update etc..
    • You can define the value of sql_last_value by tracking it with a specific table's column value. Useful when you need to ingest auto increment data based. For that, you need to specify use_column_value => true and tracking_column => "column_name_to_track".

    The following example will store the last mytable row's id into :sql_last_value to ingest in the next execution the rows that were not ingested previously, it means the rows which its id is greater than the last ingested id.

    input {
        jdbc {
            # ...
            schedule => "* * * * *"
            statement => "SELECT * from mytable where id > :sql_last_value"
            use_column_value => true
            tracking_column => id
        }
    }
    
    

    Extremely important !!!

    When you use multiple inputs in your pipeline, each input block will overwrite the value of sql_last_value of the last one. For avoiding that behaviour, you can use last_run_metadata_path => "/path/to/sql_last_value/of_your_pipeline.yml" option, which means that each pipepline will stores its own value in a different file.

提交回复
热议问题