apache-kafka-connect | 易学教程

Flush size when using kafka-connect-transform-archive with HdfsSinkConnector

阅读更多关于 Flush size when using kafka-connect-transform-archive with HdfsSinkConnector

问题 I have data in a Kafka topic which I want to preserve on my data lake. Before worrying about the keys, I was able to save the Avro values in files on the datalake using HdfsSinkConnector. The number of message values in each file was determined by the "flush.size" property of the HdfsSinkConnector. All good. Next I wanted to preserve the keys as well. To do this I used the kafka-connect-transform-archive which wraps the String key and Avro value into a new Avro schema. This works great ...

Is it possible for a Kafka Connect connector to be configured to skip rows with specific values?

阅读更多关于 Is it possible for a Kafka Connect connector to be configured to skip rows with specific values?

问题 I have a sink connector that writes rows to a MySQL database. I'd like to skip rows which have a "source": "whatever" key: value pair. Is this possible? 来源： https://stackoverflow.com/questions/57978804/is-it-possible-for-a-kafka-connect-connector-to-be-configured-to-skip-rows-with

where does confluent s3 sink put the key?

阅读更多关于 where does confluent s3 sink put the key?

问题 I setup a confluent s3 sink connect, it stores .avro files in s3. I dump those files, and find out that they are just the message itself, I don't know where can I find the message key, any idea? The config is like: { "name": "s3-sink-test", "config": { "connector.class": "io.confluent.connect.s3.S3SinkConnector", "tasks.max": "1", "topics": "book", "s3.region": "eu-central-1", "s3.bucket.name": "kafka", "s3.part.size": "5242880", "storage.class": "io.confluent.connect.s3.storage.S3Storage",

kafka mongodb sink connector not starting

阅读更多关于 kafka mongodb sink connector not starting

问题 I've installed confluent_3.3.0 and started zookeper, schema-registry and kafka broker. I have also downloaded mongodb connector from this link. Description: I'm running sink connector using the following command: ./bin/connect-standalone etc/kafka/connect-standalone.properties /home/username/mongo-connect-test/kafka-connect-mongodb/quickstart-couchbase-sink.properties Problem: I'm getting the following error: ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone

Run kafka connect distributed mode on many nodes

阅读更多关于 Run kafka connect distributed mode on many nodes

问题 I'm resiliency testing a kafka connector and I'd like to kill off a worker while it's running, thus killing the connector instance. The easiest way is probably going to be to force distributed mode to run over more than one node, then just kill the worker process on that node (right?). How can I make Kafka connect spawn workers on more than just the node it's started on? Is this something which is defined in worker config? 回答1: Yes, handling failures and automatically restarting workload is

Kafka Connect failing to read from Kafka topics over SSL

阅读更多关于 Kafka Connect failing to read from Kafka topics over SSL

问题 Running kafka connect in our docker-swarm, with the following compose file: cp-kafka-connect-node: image: confluentinc/cp-kafka-connect:5.1.0 ports: - 28085:28085 secrets: - kafka.truststore.jks - source: kafka-connect-aws-credentials target: /root/.aws/credentials environment: CONNECT_BOOTSTRAP_SERVERS: kafka01:9093,kafka02:9093,kafka03:9093 CONNECT_LOG4J_ROOT_LEVEL: TRACE CONNECT_REST_PORT: 28085 CONNECT_GROUP_ID: cp-kafka-connect CONNECT_CONFIG_STORAGE_TOPIC: dev_cp-kafka-connect-config

Kafka JDBC source connector time stamp mode failing for sqlite3

阅读更多关于 Kafka JDBC source connector time stamp mode failing for sqlite3

问题 I tried to set up a database with two tables in sqlite. Once of my table is having a timestamp column . I am trying to implement timestamp mode to capture incremental changes in the DB. Kafka connect is failing with the below error: ERROR Failed to get current time from DB using Sqlite and query 'SELECT CURRENT_TIMESTAMP' (io.confluent.connect.jdbc.dialect.SqliteDatabaseDialect:471) java.sql.SQLException: Error parsing time stamp Caused by: java.text.ParseException: Unparseable date: "2019-02

Kafka Connect - File Source Connector error

阅读更多关于 Kafka Connect - File Source Connector error

问题 I am playing with Conluent Platform/Kafka Connect and similar things and I wanted to run few examples. I followed quickstart from here. It means: Install Confluent Platform (v3.2.1) Run Zookeeper, Kafka Broker and Schema Register Run example for reading file data (witk Kafka Connect) I ran this command (number 3): [root@sandbox confluent-3.2.1]# ./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka/connect-file-source.properties but got this result:

Kafka Connect sink tasks ignore tolerance limits

阅读更多关于 Kafka Connect sink tasks ignore tolerance limits

问题 I try to ignore bad messages in sink connector with errors.tolerance: all option. Full connector configuration: { "name": "crm_data-sink_pandora", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "tasks.max": 6, "topics": "crm_account_detail,crm_account_on_competitors,crm_event,crm_event_participation", "connection.url": "jdbc:postgresql://dburl/service?prepareThreshold=0", "connection.user": "pandora.app", "connection.password": "*******", "dialect.name":

Specify version when deploying to nexus from maven

阅读更多关于 Specify version when deploying to nexus from maven

问题 I've forked Confluent's Kafka Connect HDFS writer and now I'd like to deploy a version of this jar to my local Nexus. mvn clean deploy Works like a charm and deploys the jar. https://[nexus]/repository/releases/io/confluent/kafka-connect-hdfs/5.0.0/kafka-connect-hdfs-5.0.0.jar So far so good, but to make a distinction between the confluent versions and my own deployment I'd like to change the version of the build to something like 5.0.0-1 or so (preferably the tag name when pushed, but that's