apache-kafka-connect | 易学教程

Kafka connect possible to use custom query with bulk mode?

阅读更多关于 Kafka connect possible to use custom query with bulk mode?

问题 I'm trying to send record for every row 7 days old. This is the configuration I was working on but it doesn't work even though the query produces records on the DB server. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": 1, "mode": "bulk", "connection.url": "jdbc:mysql://mysql:3300/test_db?user=root&password=password", "query": "SELECT * FROM test_table WHERE DATEDIFF(CURDATE(), test_table.modified) = 7;", "topic.prefix": "test-jdbc-", "poll.interval.ms":

Kafka source connect poll method called by multiple threads

阅读更多关于 Kafka source connect poll method called by multiple threads

问题 I have written a file source connect, I tried starting it in standalone and as well as distributed mode.. I notice sometimes the poll method is called by multiple threads. I understand if we create a task - that will be assigned to a worker which assigns a single thread and poll method is called by only this thread. But in my case it is behaving differently. Can somebody please explain me under what circumstances a source task is executed by multiple threads concurrently. This is creating

How to Process a kafka KStream and write to database directly instead of sending it another topic

阅读更多关于 How to Process a kafka KStream and write to database directly instead of sending it another topic

问题 I don't want to write processed KStream to another topic, I directly want to write enriched KStream to database. How should I proceed? 回答1: You can implement a custom Processor that opens a DB connection and apply it via KStream#process() . Cf. https://docs.confluent.io/current/streams/developer-guide.html#applying-processors-and-transformers-processor-api-integration Note, you will need to do sync writes into your DB to guard against data loss. Thus, not writing back to a topic has multiple

Kafka Connector for Azure Blob Storage

阅读更多关于 Kafka Connector for Azure Blob Storage

问题 I need to store the messages pushed to Kafka in a deep storage. We are using Azure cloud services so I suppose Azure Blob storage could be a better option. I want to use Kafka Connect's sink connector API to push data to Azure Blob. Kafka documentation mostly suggests HDFS to export data however, in that case I need a Linux VM running Hadoop that will be costly I guess. My question is Azure Blob storage an appropriate choice to store JSON objects and building a custom sink connector is a

Kafka connect: The configuration XXX was supplied but isn't a known config in AdminClientConfig

阅读更多关于 Kafka connect: The configuration XXX was supplied but isn't a known config in AdminClientConfig

问题 When starting Kafka-Connect, I saw lots of warnings 10:33:56.706 [DistributedHerder] WARN org.apache.kafka.clients.admin.AdminClientConfig - The configuration 'config.storage.topic' was supplied but isn't a known config. 10:33:56.707 [DistributedHerder] WARN org.apache.kafka.clients.admin.AdminClientConfig - The configuration 'group.id' was supplied but isn't a known config. 10:33:56.708 [DistributedHerder] WARN org.apache.kafka.clients.admin.AdminClientConfig - The configuration 'status

Kafka Connect : How to fetch nested fields from Struct

阅读更多关于 Kafka Connect : How to fetch nested fields from Struct

问题 I am using Kafka-Connect to implement a Kafka-Elasticsearch connector. The producer sent a complex JSON on to a Kafka Topic and my connector code will use this to persist to Elastic search. The connector get the data in form of Struct(https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/data/Struct.html). I am able to get the field values of struct at top level Json but not able to fetch from nested jsons. { "after": { "test.test.employee.Value": { "id": 5671111, "name": { "string":

What's the key differences in existent approaches to mirror Kafka topics

阅读更多关于 What's the key differences in existent approaches to mirror Kafka topics

问题 Kafka MirrorMaker is a basic approach to mirror Kafka topics from source to target brokers. Unfortunately, it doesn't fit my requirements to be configurable enough. My requirements are very simple: the solution should be JVM application if destination topic doesn't exist, creates it solution should have the ability to add prefixes/suffixes to destination topic names it should reload and apply configurations on the fly if they're changed According to this answer there are several alternative

Kafka-Connect vs Filebeat & Logstash

阅读更多关于 Kafka-Connect vs Filebeat & Logstash

问题 I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch. I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module. I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature

Kafka-Connect vs Filebeat & Logstash

阅读更多关于 Kafka-Connect vs Filebeat & Logstash

Confluent kafka connect elasticsearch document ID creation

阅读更多关于 Confluent kafka connect elasticsearch document ID creation

问题 I am using confluent for to connect my DB and ES getting exception as: org.apache.kafka.connect.errors.DataException: STRUCT is not supported as the document id. at io.confluent.connect.elasticsearch.DataConverter.convertKey(DataConverter.java:75) at io.confluent.connect.elasticsearch.DataConverter.convertRecord(DataConverter.java:84) at io.confluent.connect.elasticsearch.ElasticsearchWriter.write(ElasticsearchWriter.java:210) at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.put