kafka-consumer-api

Is it possible to obtain specific message offset in Kafka+SparkStreaming?

匆匆过客 提交于 2019-12-07 02:53:19
问题 I'm trying to obtain and store the offset for a specific message in Kafka by using Spark Direct Stream. Looking at the Spark documentation is simple to obtain the range offsets for each partition but what I need is to store the start offset for each message of a topic after a full scan of the queue. 回答1: Yes, you can use MessageAndMetadata version of createDirectStream which allows you to access message metadata . You can find example here which returns Dstream of tuple3 . val ssc = new

Is it possible to transfer files using Kafka?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-07 01:40:59
问题 I have thousands of files generated each day which I want to stream using Kafka. When I try to read the file, each line is taken as a separate message. I would like to know how can I make each file's content as a single message in Kafka topic and with consumer how to write each message from Kafka topic in a separate file. 回答1: You can write your own serializer/deserializer for handling files. For example : Producer Props : props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka

How to connect Kafka with Elasticsearch?

拥有回忆 提交于 2019-12-07 01:06:04
问题 I am new in Kafka, I use kafka to collect netflow through logstash(it is ok), and I want to send the data to elasticsearch from kafka, but there are some problems. My question is how can I connect Kafka with Elasticsearch? netflow to kafka logstash config: input{ udp{ host => "120.127.XXX.XX" port => 5556 codec => netflow } } filter{ } output { kafka { bootstrap_servers => "localhost:9092" topic_id => "test" } stdout{codec=> rubydebug} } kafka to elasticsearch logstash: input { kafka { } }

How to pass topics dynamically to a kafka listener?

岁酱吖の 提交于 2019-12-06 16:22:27
From a couple of days I'm trying out ways to dynamically pass topics to Kafka listener rather than using them through keys from a Java DSL. Anyone around done this before or could throw some light on what is the best way to achieve this? You cannot "dynamically pass topics to Kafka listener "; you have to programmatically create a listener container instead. Here is a working solution: // Start brokers without using the "@KafkaListener" annotation Map<String, Object> consumerProps = consumerProps("my-srv1:9092", "my-group", "false"); DefaultKafkaConsumerFactory<String, String> cf = new

How to pause a kafka consumer?

我怕爱的太早我们不能终老 提交于 2019-12-06 13:53:36
问题 I am using a Kafka producer - consumer model in my framework. The record consumed at the consumer end is later indexed onto the elasticsearch. Here i have a use case where if the ES is down, I will have to pause the kafka consumer until the ES is up, Once it is up, I need to resume the consumer and consume the record from where I last left. I don't think this can be achieved with @KafkaListener. Can anyone please give me a solution for this? I figured out that I need to write my own

Can I run Kafka Streams Application on the same machine as of Kafka Broker?

℡╲_俬逩灬. 提交于 2019-12-06 13:22:23
I have a Kafka Streams Application which takes data from few topics and joins the data and puts it in another topic. Kafka Configuration: 5 kafka brokers Kafka Topics - 15 partitions and 3 replication factor. Note: I am running Kafka Streams Applications on the same machines where my Kafka Brokers are running. Few millions of records are consumed/produced every hour. Whenever I take any kafka broker down, it goes into rebalancing and it takes approx. 30 minutes or sometimes even more for rebalancing and many times it kills many of the Kafka Streams processes. It is technically possible to run

Bucket records based on time(kafka-hdfs-connector)

荒凉一梦 提交于 2019-12-06 12:56:27
I am trying to copy data from Kafka into Hive tables using kafka-hdfs-connector provided by Confluent platform. While I am able to do it successfully I was wondering how to bucket the incoming data based on time interval. For example, I would like to have a new partition created every 5 minutes. I tried io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner with partition.duration.ms but I think I am doing it the wrong way. I see only one partition in the Hive table with all the data going into that particular partition. Something like this : hive> show partitions test; OK partition year

Apache Kafka: Exactly Once in Version 0.10

落花浮王杯 提交于 2019-12-06 10:53:42
问题 To achieve exactly-once processing of messages by Kafka consumer I am committing one message at a time, like below public void commitOneRecordConsumer(long seconds) { KafkaConsumer<String, String> consumer = consumerConfigFactory.getConsumerConfig(); try { while (running) { ConsumerRecords<String, String> records = consumer.poll(1000); try { for (ConsumerRecord<String, String> record : records) { processingService.process(record); consumer.commitSync(Collections.singletonMap(new

Kafka Consumer Error - xxxx nodename nor servname provided, or not known

若如初见. 提交于 2019-12-06 10:13:23
When running the console consumer using the following command $ ~/project/libs/kafka_2.9.2-0.8.1.1/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic customerevents --autocommit.interval.ms 100 --group customereventsgroup i get the following error Exception in thread "main" java.net.UnknownHostException: HQSML-142453: HQSML-142453: nodename nor servname provided, or not known at java.net.InetAddress.getLocalHost(InetAddress.java:1473) at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:107) at kafka.consumer.ZookeeperConsumerConnector.<init>

How to choose the no of partitions for a kafka topic?

瘦欲@ 提交于 2019-12-06 05:43:35
问题 We have 3 zk nodes cluster and 7 brokers. Now we have to create a topic and have to create partitions for this topic. But I did not find any formula to decide that how much partitions should I create for this topic. Rate of producer is 5k messages/sec and size of each message is 130 Bytes. Thanks In Advance 回答1: It depends on your required throughput, cluster size, hardware specifications: There is a clear blog about this written by Jun Rao from Confluent: How to choose the number of topics