logstash jdbc connector time based data

问题

With the new logstash jdbc connector here:

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html How do subsequent logstash runs effect whats already indexed into ElasticSearch? Does it create new documents in the ES index, or does it update the docs that match the row that have already been indexes? The use case I'm try to tackle is to index rows with timestamps into elastic search, but the table continually gets updated i would like to only index new data, or if I have to read the table again, only add new documents for new rows.

Any suggestions? Or more documentation around the logstash jdbc plugin?

回答1:

What I would do is to include in the query statement the timestamp of the last time the plugin ran (i.e. sql_last_start) and so it will only load the newly updated records.

For instance, your jdbc input plugin would look like this:

input {
  jdbc {
    jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
    jdbc_user => "mysql"
    schedule => "* * * * *"
    statement => "SELECT * FROM mytable WHERE timestamp > :sql_last_start"
  }
}

Make sure to change timestamp with the name of your field containing the last updated date and mytable with the real name of your table.

来源：https://stackoverflow.com/questions/31995648/logstash-jdbc-connector-time-based-data

标签

jdbc

ElasticSearch

logstash

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!