I\'m using the JDBC plugin for ElasticSearch to update my MySQL database. It picks up new and changed records, but does not delete records that have been removed from MySQL.
Since this question has been asked, the parameters have changed greatly, versioning and digesting have been deprecated, and poll has been replaced by schedule, which will take a cron expression on how often to rerun the river (below is scheduled to run every 5 mins)
curl -XPUT 'localhost:9200/_river/account_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "test_user",
"password" : "test_pass",
"sql" : "SELECT `account`.`id` as `_id`, `account`.`id`, `account`.`reference`, `account`.`company_name`, `account`.`also_known_as` from `account` WHERE NOT `account`.`deleted`",
"strategy" : "simple",
"schedule": "0 0/5 * * * ?" ,
"autocommit" : true,
"index" : "headphones",
"type" : "Account"
}
}'
But for the main question, the answer i got from the developer is this https://github.com/jprante/elasticsearch-river-jdbc/issues/213
Deletion of rows is no longer detected.
I tried housekeeping with versioning, but this did not work well together with incremental updates and adding rows.
A good method would be windowed indexing. Each timeframe (maybe once per day or per week) a new index is created for the river, and added to an alias. Old indices are to be dropped after a while. This maintenance is similar to logstash indexing, but it is outside the scope of a river.
The method i am currently using as a I research aliasing is I recreate the index and river nightly, and schedule the river to run every few hours. It ensures new data being put in will be indexed that day, and deletions will reflect every 24 hrs
i am still relatively new to elastic and had been using jdbc river for my project. If i understood correctly, which not necessarily could be the case, this is how it works:
so considering that you would want to have a housekeeping running you need to have versioning to be set to true
and subsequently this implies that digesting
should be set to true
as well.
So having said that your river should look like this:
curl -XPUT 'localhost:9200/_river/account_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "test_user",
"password" : "test_pass",
"sql" : "SELECT `account`.`id` as `_id`, `account`.`id`, `account`.`reference`, `account`.`company_name`, `account`.`also_known_as` from `account` WHERE NOT `account`.`deleted`",
"strategy" : "simple",
"poll" : "5s",
"autocommit" : true,
"index": {
"index" : "headphones",
"type" : "Account",
"versioning" : true,
"digesting" : true
}
}
}'
note that versioning
and digesting
should be part of index
definition and not jdbc
definition