Migrating 3 million records from Oracle to Elastic search using logstash

问题

We are trying to migrate around 3 million records from oracle to Elastic Search using Logstash. We are applying a couple of jdbc_streaming filters as a part of our logstash script, one to load connecting nested objects and another to run a hierarchical query to load data to another nested object in the index.

We are able to index 0.4 million records in 24 hours. The total size occupied by .4 million records is around 300MB. We tried multiple approaches to migrate data quickly into elastic from oracle but were not able to achieve desired results. Please find below the approaches we tried : 1.In the logstash script, we used jdbc_fetch_size, jdbc_page_size, jdbc_paging_enabled, clean_run parameters, set pipeline workers to 20 and pipeline batch size to 125 in logstash.yml file. 2. On the elastic side, we set the number of replicas to 0, refresh interval to -1, tried increasing the value of indices.memory.index_buffer_size parameter, increased number of watcher queues in the elastic.yml file.

We basically googled out and followed various suggestions from this site and others too but nothing seems to work out so far.

We are using a single node elastic setup and neither the DB nor the elastic node are present on the machine from which we are running the logstash script.

Please find below the logstash config file

input {
jdbc {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
statement => "select * from "
}
}

filter{
jdbc_streaming {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
#statement => "select claimnumber,claimtype,is_active from claim where policynumber = :policynumber"
parameters => {"policynumber" => "policynumber"}
target => "nested node"
}
stdout { codec => json }
}

filter{
jdbc_streaming {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
statement => "select listagg(column name,'/' ) within group(order by column name) from

where LEVEL > 1
start with =:
connect by prior = "
parameters => {"p1" => "p1"}
target => "nested node1"
}
}
output {
elasticsearch {
hosts => [""]
index => "<index_name>"
document_id => "%{doc_id}"
}
}

Can you please help us identify bottlenecks and also make suggestions on how to increase indexing performance.

Thank You

来源：https://stackoverflow.com/questions/62957239/migrating-3-million-records-from-oracle-to-elastic-search-using-logstash

标签

Oracle

ElasticSearch

logstash

logstash-configuration