问题
With the new logstash jdbc connector here:
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html How do subsequent logstash runs effect whats already indexed into ElasticSearch? Does it create new documents in the ES index, or does it update the docs that match the row that have already been indexes? The use case I'm try to tackle is to index rows with timestamps into elastic search, but the table continually gets updated i would like to only index new data, or if I have to read the table again, only add new documents for new rows.
Any suggestions? Or more documentation around the logstash jdbc plugin?
回答1:
What I would do is to include in the query statement the timestamp of the last time the plugin ran (i.e. sql_last_start) and so it will only load the newly updated records.
For instance, your jdbc
input plugin would look like this:
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "mysql"
schedule => "* * * * *"
statement => "SELECT * FROM mytable WHERE timestamp > :sql_last_start"
}
}
Make sure to change timestamp
with the name of your field containing the last updated date and mytable
with the real name of your table.
来源:https://stackoverflow.com/questions/31995648/logstash-jdbc-connector-time-based-data