apache-kafka-connect

Parquet Output From Kafka Connect to S3

非 Y 不嫁゛ 提交于 2019-12-10 15:12:49
问题 I see Kafka Connect can write to S3 in Avro or JSON formats. But there is no Parquet support. How hard would this be to add? 回答1: The Qubole connector supports writing out parquet - https://github.com/qubole/streamx 回答2: Try secor : https://github.com/pinterest/secor Can work with AWS S3, google cloud, Azure's blob storage etc. Note that the solution you choose must have key features like: Guarantee writing each message exactly once, load distribution, fault tolerance, monitoring,

Kafka Connect - How to delete a connector

ぃ、小莉子 提交于 2019-12-10 14:17:47
问题 I created a cassandra-sink connector after that I made some changes in connector.properties file. After stopping the worker and starting it again, now when I add the connector using: java -jar kafka-connect-cli-1.0.6-all.jar create cassandra-sink-orders < cassandra-sink-distributed-orders.properties I get the following error: Error: the Kafka Connect API returned: Connector cassandra-sink-orders already exists (409) How can I remove the existing connector? 回答1: You can use the Kafka Connect

Using a custom converter with Kafka Connect?

天涯浪子 提交于 2019-12-10 11:58:36
问题 I'm trying to use a custom converter with Kafka Connect and I cannot seem to get it right. I'm hoping someone has experience with this and could help me figure it out ! Initial situation my custom converter's class path is custom.CustomStringConverter . to avoid any mistakes, my custom converter is currently just a copy/paste of the pre-existing StringConverter (of course, this will change when I'll get it to work). https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org

How to change the “kafka connect” component port?

眉间皱痕 提交于 2019-12-10 10:39:44
问题 On port 8083 I am running Influxdb for which I am even getting the GUI on http://localhost:8083 Now come to kafka, Here I am referring the setup as per https://kafka.apache.org/quickstart starting the zookeeeper which is in folder /opt/zookeeper-3.4.10 by the command: bin/zkServer.sh start So zookeeper is started now starting kafka under /opt/kafka_2.11-1.1.0 folder as : bin/kafka-server-start.sh config/server.properties create a topic named "test" with a single partition and only one replica

Kafka connect (Single message transform) row filtering

左心房为你撑大大i 提交于 2019-12-09 01:31:42
问题 I read about Kafka connect transformations introduced in kafka 0.10.2.1 https://kafka.apache.org/documentation/#connect_transforms I noticed that all the transformations are column based transformations. I have a use-case where I need value based filtering. For example: consider the following dataset of a group of people: {"firstName": "FirstName1", "lastName": "LastName1", "age": 30} {"firstName": "FirstName2", "lastName": "LastName2", "age": 30} {"firstName": "FirstName3", "lastName":

Kafka Connect not outputting JSON

心已入冬 提交于 2019-12-08 11:48:36
问题 I am using the JDBC Kafka Connector to read data from a database into Kafka. That works, but it always outputs data in Avro format even though I've specified that it should use JSON. I know it is doing this because when I consume messages from that topic in python, I see the schema at the top of each message. I run the connector like this: /usr/bin/connect-standalone /etc/schema-registry/connect-json-standalone.properties /etc/kafka-connect-jdbc/view.properties The content of the connect-json

Kafka JDBC Connect query causes ORA-00933: SQL command not properly ended

允我心安 提交于 2019-12-08 07:37:19
问题 I have this Oracle SQL query: SELECT * FROM (SELECT SO_ORDER_KEY,QUEUE_TYPE,SYS_NO, DENSE_RANK() OVER (PARTITION BY SO_ORDER_KEY ORDER BY SYS_NO DESC) ORDER_RANK FROM TSY940) WHERE ORDER_RANK=1; When running in SQL developer, it returns the desired result. For some reason when I use this query in the kafka-connect-jdbc properties I get ERROR Failed to run query for table TimestampIncrementingTableQuerier{name='null', query='SELECT * FROM (SELECT SO_ORDER_KEY,QUEUE_TYPE,SYS_NO,DENSE_RANK()

How to fetch Kafka source connector schema based on connector name

*爱你&永不变心* 提交于 2019-12-08 06:51:11
问题 I am using Confluent JDBC Kafka connector to publish messages into topic. The source connector will send data to topic along with schema on each poll. I want to retrieve this schema. Is it possible? How? Can anyone suggest me My intention is to create a KSQL stream or table based on schema build by Kafka connector on poll. 回答1: The best way to do this is to use Avro, in which the schema is stored separately and automatically used by Kafka Connect and KSQL. You can use Avro by configuring

How to sink kafka topic to oracle using kafka connect

柔情痞子 提交于 2019-12-07 15:54:27
I have a kafka topic with data, following is config file I am using to sink data to oracle. Sink.properties name=ora_sink_task connector.class=io.confluent.connect.jdbc.JdbcSinkConnector tasks.max=1 topics=person connection.url=jdbc:oracle:thin:@127.0.0.1:1521/XE connection.user=kafka connection.password=kafka auto.create=true insert.mode=upsert pk.mode=record_value pk.fields=id I am getting following response in logs. [2017-06-06 21:09:33,557] DEBUG Scavenging sessions at 1496504373557 (org.eclipse.jetty.server.session:347) [2017-06-06 21:10:03,557] DEBUG Scavenging sessions at 1496504403557

Kafka Streams with lookup data on HDFS

情到浓时终转凉″ 提交于 2019-12-07 02:37:22
问题 I'm writing an application with Kafka Streams (v0.10.0.1) and would like to enrich the records I'm processing with lookup data. This data (timestamped file) is written into a HDFS directory on daily basis (or 2-3 times a day). How can I load this in the Kafka Streams application and join to the actual KStream ? What would be the best practice to reread the data from HDFS when a new file arrives there? Or would it be better switching to Kafka Connect and write the RDBMS table content to a