Kafka connect possible to use custom query with bulk mode?

问题

I'm trying to send record for every row 7 days old. This is the configuration I was working on but it doesn't work even though the query produces records on the DB server.

{
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "tasks.max": 1,
    "mode": "bulk",
    "connection.url": "jdbc:mysql://mysql:3300/test_db?user=root&password=password",
    "query": "SELECT * FROM test_table WHERE DATEDIFF(CURDATE(), test_table.modified) = 7;",
    "topic.prefix": "test-jdbc-",
    "poll.interval.ms": 10000
}

回答1:

The JDBC source connector import data from relational database into Apache Kafka topic by using JDBC driver. Data is loading periodically either increment based on timestamp or bulk load. Initially despite mode increment or bulk when you create JDBC connector it load all data into topic after it will load only new or modified rows on time stamp column.

Bulk: This mode is unfiltered and therefore not incremental at all. It will load all rows from a table on each iteration. This can be useful if you want to periodically dump an entire table where entries are eventually deleted and the downstream system can safely handle duplicates. So means you can't load last 7 days incrementally using bulk mode

Timestamp Column: In this mode, a single column containing a modification timestamp is used to track the last time data was processed and to query only for rows that have been modified since that time. Here you can able to load incremental data. But how it work when you create first time it will load all data available in database table because for JDBC connector these are new data. Later it will only load new or modified data.

Now as per your requirement seems your are trying load all data at some time interval which is going to configured "poll.interval.ms": 10000. I see your JDBC connect setting is as per definition whereas query may not work try using query as below. Seems JDBC connector wrap query as table which is not working if you add where case.

"query": "select * from (select * from test_table where  modified > now() - interval '7' day) o",

Try below setting

{
  "name": "application_name",
  "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
  "key.converter": "org.apache.kafka.connect.json.JsonConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "connection.url": "jdbc:mysql://mysql:3300/test_db",
  "connection.user": "root",
  "connection.password": "password",
  "connection.attempts": "1",
  "mode": "bulk",
  "validate.non.null": false,
  "query": "select * from (select * from test_table where  modified > now() - interval '7' day) o",
  "table.types": "TABLE",
  "topic.prefix": "test-jdbc-",
 "poll.interval.ms": 10000
  "schema.ignore": true,
  "key.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false"

}

来源：https://stackoverflow.com/questions/58456759/kafka-connect-possible-to-use-custom-query-with-bulk-mode

标签

apache-kafka

apache-kafka-connect