I\'m quite unclear of what sql_last_value does when I give my statement as such:
statement => \"SELECT * from mytable where id > :sql_last
In simple words, sql_last_value allows you to persist data from your last sql run as its name sugets.
This value is specially useful when you schedule your query. But why ... ?
Because you can create your sql statement condition based on the value stored in sql_last_value and avoid to retrieve rows that were already ingested for your logstash input or updated after last pipeline execution.
Things to keep in mind when using sql_last_value
creation_date last_update etc..sql_last_value by tracking it with a specific table's column value. Useful when you need to ingest auto increment data based. For that, you need to specify use_column_value => true and tracking_column => "column_name_to_track".The following example will store the last mytable row's id into :sql_last_value to ingest in the next execution the rows that were not ingested previously, it means the rows which its id is greater than the last ingested id.
input {
jdbc {
# ...
schedule => "* * * * *"
statement => "SELECT * from mytable where id > :sql_last_value"
use_column_value => true
tracking_column => id
}
}
When you use multiple inputs in your pipeline, each input block will overwrite the value of sql_last_value of the last one. For avoiding that behaviour, you can use last_run_metadata_path => "/path/to/sql_last_value/of_your_pipeline.yml" option, which means that each pipepline will stores its own value in a different file.