Logstash input jdbc is duplicating results

前端 未结 2 1938
时光取名叫无心
时光取名叫无心 2020-12-29 15:57

I\'m using logstash input jdbc plugin to read two (or more) databases and send the data to elasticsearch, and using kibana 4 to vizualize these data.

This is my log

2条回答
  •  轮回少年
    2020-12-29 16:37

    By default, the jdbc input will execute the configured SQL statement. In your case, your statement selects everything in test_table. You need to instruct your SQL statement to only load data from the last time the jdbc input ran by using the predefined sql_last_start parameter in your SQL query.

    input {
      jdbc {
        type => "A"
        jdbc_driver_library => "C:\DEV\elasticsearch-1.7.1\plugins\elasticsearch-jdbc-1.7.1.0\lib\jtds-1.3.1.jar"
        jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
        jdbc_connection_string => "jdbc:jtds:sqlserver://dev_data_base_server:1433/dbApp1;domain=CORPDOMAIN;useNTLMv2=true"
        jdbc_user => "user"
        jdbc_password => "pass"
        schedule => "5 * * * *"
        statement => "SELECT id, date, content, status from test_table WHERE date > :sql_last_start"
      }
    
    jdbc {
        type => "B"
        jdbc_driver_library => "C:\DEV\elasticsearch-1.7.1\plugins\elasticsearch-jdbc-1.7.1.0\lib\jtds-1.3.1.jar"
        jdbc_driver_class => "Java::net.sourceforge.jtds.jdbc.Driver"
        jdbc_connection_string => "jdbc:jtds:sqlserver://dev_data_base_server:1433/dbApp2;domain=CORPDOMAIN;useNTLMv2=true"
        jdbc_user => "user"
        jdbc_password => "pass"
        schedule => "5 * * * *"
        statement => "SELECT id, date, content, status from test_table WHERE date > :sql_last_start"
      }
    }
    

    Also if by any coincidence the same record is loaded twice from your DB and you don't want dups to be created in your ES server, you can also specify to use the record ID as the document ID in your elasticsearch output, that way the document will be updated in ES and not duplicated.

    output {
    
        if [type] == "A" {
            elasticsearch {
                host => "localhost"
                protocol => http
                index => "logstash-servera-%{+YYYY.MM.dd}"
                document_id => "%{id}"       <--- same id as in DB
            }    
        }
        if [type] == "B" {
            elasticsearch {
                host => "localhost"
                protocol => http
                index => "logstash-serverb-%{+YYYY.MM.dd}"
                document_id => "%{id}"       <--- same id as in DB
            }    
        }
    
      stdout { codec => rubydebug }
    }
    

提交回复
热议问题