apache-kafka-connect

Kafka connect possible to use custom query with bulk mode?

我怕爱的太早我们不能终老 提交于 2019-12-24 17:08:14
问题 I'm trying to send record for every row 7 days old. This is the configuration I was working on but it doesn't work even though the query produces records on the DB server. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": 1, "mode": "bulk", "connection.url": "jdbc:mysql://mysql:3300/test_db?user=root&password=password", "query": "SELECT * FROM test_table WHERE DATEDIFF(CURDATE(), test_table.modified) = 7;", "topic.prefix": "test-jdbc-", "poll.interval.ms":

Kafka source connect poll method called by multiple threads

百般思念 提交于 2019-12-24 06:34:14
问题 I have written a file source connect, I tried starting it in standalone and as well as distributed mode.. I notice sometimes the poll method is called by multiple threads. I understand if we create a task - that will be assigned to a worker which assigns a single thread and poll method is called by only this thread. But in my case it is behaving differently. Can somebody please explain me under what circumstances a source task is executed by multiple threads concurrently. This is creating

How to Process a kafka KStream and write to database directly instead of sending it another topic

三世轮回 提交于 2019-12-24 01:55:09
问题 I don't want to write processed KStream to another topic, I directly want to write enriched KStream to database. How should I proceed? 回答1: You can implement a custom Processor that opens a DB connection and apply it via KStream#process() . Cf. https://docs.confluent.io/current/streams/developer-guide.html#applying-processors-and-transformers-processor-api-integration Note, you will need to do sync writes into your DB to guard against data loss. Thus, not writing back to a topic has multiple

Kafka Connector for Azure Blob Storage

孤街浪徒 提交于 2019-12-24 01:18:32
问题 I need to store the messages pushed to Kafka in a deep storage. We are using Azure cloud services so I suppose Azure Blob storage could be a better option. I want to use Kafka Connect's sink connector API to push data to Azure Blob. Kafka documentation mostly suggests HDFS to export data however, in that case I need a Linux VM running Hadoop that will be costly I guess. My question is Azure Blob storage an appropriate choice to store JSON objects and building a custom sink connector is a

Kafka connect: The configuration XXX was supplied but isn't a known config in AdminClientConfig

狂风中的少年 提交于 2019-12-24 00:34:22
问题 When starting Kafka-Connect, I saw lots of warnings 10:33:56.706 [DistributedHerder] WARN org.apache.kafka.clients.admin.AdminClientConfig - The configuration 'config.storage.topic' was supplied but isn't a known config. 10:33:56.707 [DistributedHerder] WARN org.apache.kafka.clients.admin.AdminClientConfig - The configuration 'group.id' was supplied but isn't a known config. 10:33:56.708 [DistributedHerder] WARN org.apache.kafka.clients.admin.AdminClientConfig - The configuration 'status

Kafka Connect : How to fetch nested fields from Struct

梦想与她 提交于 2019-12-22 14:40:55
问题 I am using Kafka-Connect to implement a Kafka-Elasticsearch connector. The producer sent a complex JSON on to a Kafka Topic and my connector code will use this to persist to Elastic search. The connector get the data in form of Struct(https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/data/Struct.html). I am able to get the field values of struct at top level Json but not able to fetch from nested jsons. { "after": { "test.test.employee.Value": { "id": 5671111, "name": { "string":

What's the key differences in existent approaches to mirror Kafka topics

冷暖自知 提交于 2019-12-22 10:38:33
问题 Kafka MirrorMaker is a basic approach to mirror Kafka topics from source to target brokers. Unfortunately, it doesn't fit my requirements to be configurable enough. My requirements are very simple: the solution should be JVM application if destination topic doesn't exist, creates it solution should have the ability to add prefixes/suffixes to destination topic names it should reload and apply configurations on the fly if they're changed According to this answer there are several alternative

Kafka-Connect vs Filebeat & Logstash

我们两清 提交于 2019-12-22 04:43:13
问题 I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch. I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module. I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature

Kafka-Connect vs Filebeat & Logstash

北战南征 提交于 2019-12-22 04:43:01
问题 I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch. I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module. I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature

Confluent kafka connect elasticsearch document ID creation

假如想象 提交于 2019-12-22 01:32:06
问题 I am using confluent for to connect my DB and ES getting exception as: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id. at io.confluent.connect.elasticsearch.DataConverter.convertKey(DataConverter.java:75) at io.confluent.connect.elasticsearch.DataConverter.convertRecord(DataConverter.java:84) at io.confluent.connect.elasticsearch.ElasticsearchWriter.write(ElasticsearchWriter.java:210) at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.put