apache-kafka-connect

Flush size when using kafka-connect-transform-archive with HdfsSinkConnector

不打扰是莪最后的温柔 提交于 2019-12-11 08:59:41
问题 I have data in a Kafka topic which I want to preserve on my data lake. Before worrying about the keys, I was able to save the Avro values in files on the datalake using HdfsSinkConnector. The number of message values in each file was determined by the "flush.size" property of the HdfsSinkConnector. All good. Next I wanted to preserve the keys as well. To do this I used the kafka-connect-transform-archive which wraps the String key and Avro value into a new Avro schema. This works great ...

Is it possible for a Kafka Connect connector to be configured to skip rows with specific values?

六眼飞鱼酱① 提交于 2019-12-11 07:38:36
问题 I have a sink connector that writes rows to a MySQL database. I'd like to skip rows which have a "source": "whatever" key: value pair. Is this possible? 来源: https://stackoverflow.com/questions/57978804/is-it-possible-for-a-kafka-connect-connector-to-be-configured-to-skip-rows-with

where does confluent s3 sink put the key?

会有一股神秘感。 提交于 2019-12-11 07:35:41
问题 I setup a confluent s3 sink connect, it stores .avro files in s3. I dump those files, and find out that they are just the message itself, I don't know where can I find the message key, any idea? The config is like: { "name": "s3-sink-test", "config": { "connector.class": "io.confluent.connect.s3.S3SinkConnector", "tasks.max": "1", "topics": "book", "s3.region": "eu-central-1", "s3.bucket.name": "kafka", "s3.part.size": "5242880", "storage.class": "io.confluent.connect.s3.storage.S3Storage",

kafka mongodb sink connector not starting

做~自己de王妃 提交于 2019-12-11 07:26:13
问题 I've installed confluent_3.3.0 and started zookeper, schema-registry and kafka broker. I have also downloaded mongodb connector from this link. Description: I'm running sink connector using the following command: ./bin/connect-standalone etc/kafka/connect-standalone.properties /home/username/mongo-connect-test/kafka-connect-mongodb/quickstart-couchbase-sink.properties Problem: I'm getting the following error: ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone

Run kafka connect distributed mode on many nodes

本秂侑毒 提交于 2019-12-11 07:26:02
问题 I'm resiliency testing a kafka connector and I'd like to kill off a worker while it's running, thus killing the connector instance. The easiest way is probably going to be to force distributed mode to run over more than one node, then just kill the worker process on that node (right?). How can I make Kafka connect spawn workers on more than just the node it's started on? Is this something which is defined in worker config? 回答1: Yes, handling failures and automatically restarting workload is

Kafka Connect failing to read from Kafka topics over SSL

时间秒杀一切 提交于 2019-12-11 07:21:25
问题 Running kafka connect in our docker-swarm, with the following compose file: cp-kafka-connect-node: image: confluentinc/cp-kafka-connect:5.1.0 ports: - 28085:28085 secrets: - kafka.truststore.jks - source: kafka-connect-aws-credentials target: /root/.aws/credentials environment: CONNECT_BOOTSTRAP_SERVERS: kafka01:9093,kafka02:9093,kafka03:9093 CONNECT_LOG4J_ROOT_LEVEL: TRACE CONNECT_REST_PORT: 28085 CONNECT_GROUP_ID: cp-kafka-connect CONNECT_CONFIG_STORAGE_TOPIC: dev_cp-kafka-connect-config

Kafka JDBC source connector time stamp mode failing for sqlite3

夙愿已清 提交于 2019-12-11 06:34:29
问题 I tried to set up a database with two tables in sqlite. Once of my table is having a timestamp column . I am trying to implement timestamp mode to capture incremental changes in the DB. Kafka connect is failing with the below error: ERROR Failed to get current time from DB using Sqlite and query 'SELECT CURRENT_TIMESTAMP' (io.confluent.connect.jdbc.dialect.SqliteDatabaseDialect:471) java.sql.SQLException: Error parsing time stamp Caused by: java.text.ParseException: Unparseable date: "2019-02

Kafka Connect - File Source Connector error

人盡茶涼 提交于 2019-12-11 06:14:06
问题 I am playing with Conluent Platform/Kafka Connect and similar things and I wanted to run few examples. I followed quickstart from here. It means: Install Confluent Platform (v3.2.1) Run Zookeeper, Kafka Broker and Schema Register Run example for reading file data (witk Kafka Connect) I ran this command (number 3): [root@sandbox confluent-3.2.1]# ./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka/connect-file-source.properties but got this result:

Kafka Connect sink tasks ignore tolerance limits

淺唱寂寞╮ 提交于 2019-12-11 00:34:36
问题 I try to ignore bad messages in sink connector with errors.tolerance: all option. Full connector configuration: { "name": "crm_data-sink_pandora", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "tasks.max": 6, "topics": "crm_account_detail,crm_account_on_competitors,crm_event,crm_event_participation", "connection.url": "jdbc:postgresql://dburl/service?prepareThreshold=0", "connection.user": "pandora.app", "connection.password": "*******", "dialect.name":

Specify version when deploying to nexus from maven

那年仲夏 提交于 2019-12-10 22:25:47
问题 I've forked Confluent's Kafka Connect HDFS writer and now I'd like to deploy a version of this jar to my local Nexus. mvn clean deploy Works like a charm and deploys the jar. https://[nexus]/repository/releases/io/confluent/kafka-connect-hdfs/5.0.0/kafka-connect-hdfs-5.0.0.jar So far so good, but to make a distinction between the confluent versions and my own deployment I'd like to change the version of the build to something like 5.0.0-1 or so (preferably the tag name when pushed, but that's