kafka-consumer-api

How to make restart-able producer?

北城余情 提交于 2019-11-28 02:25:52
Latest version of kafka support exactly-once-semantics (EoS). To support this notion, extra details are added to each message. This means that at your consumer; if you print offsets of messages they won't be necessarily sequential. This makes harder to poll a topic to read the last committed message. In my case, consumer printed something like this Offset-0 0 Offset-2 1 Offset-4 2 Problem: In order to write restart-able proudcer; I poll the topic and read the content of last message. In this case; last message would be offset#5 which is not a valid consumer record. Hence, I see errors in my

Kafka Avro Consumer with Decoder issues

有些话、适合烂在心里 提交于 2019-11-28 00:02:31
When I attempted to run Kafka Consumer with Avro over the data with my respective schema,it returns an error of "AvroRuntimeException: Malformed data. Length is negative: -40" . I see others have had similar issues converting byte array to json , Avro write and read , and Kafka Avro Binary *coder . I have also referenced this Consumer Group Example , which have all been helpful, however no help with this error thus far.. It works up until this part of code (line 73) Decoder decoder = DecoderFactory.get().binaryDecoder(byteArrayInputStream, null); I have tried other decoders and printed out the

KafkaConsumer 0.10 Java API error message: No current assignment for partition

自作多情 提交于 2019-11-27 21:20:59
I am using KafkaConsumer 0.10 Java api. I want to consume from a specific partition and specific offset. I looked up and found that there is a seek method but its throwing an exception. Anyone had a similar use case or solution ? Code: KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(consumerProps); consumer.seek(new TopicPartition("mytopic", 1), 4); Exception java.lang.IllegalStateException: No current assignment for partition mytopic-1 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251) at org.apache.kafka.clients.consumer

How to read data using Kafka Consumer API from beginning?

你说的曾经没有我的故事 提交于 2019-11-27 20:03:28
Please can anyone tell me how to read messages using the Kafka Consumer API from the beginning every time when I run the consumer jar. This works with the 0.9.x consumer. Basically when you create a consumer, you need to assign a consumer group id to this consumer using the property ConsumerConfig.GROUP_ID_CONFIG . Generate the consumer group id randomly every time you start the consumer doing something like this properties.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); (properties is an instance of java.util.Properties that you will pass to the constructor new

Can a Kafka consumer(0.8.2.2) read messages in batch

懵懂的女人 提交于 2019-11-27 16:51:40
问题 As per my understanding Kafka consumer reads messages from an assigned partition sequentially... We are planning to have multiple Kafka consumer (Java) which has same group I'd ..so if it reads sequentially from an assigned partition then how we can achieve high throughput ..i.e. For Example Producer publishes messages like 40 per sec ... Consumer process msg 1 per sec ..though we can have multiple consumers but cannot have 40 rt??? Correct me if I'm wrong... And in our case consumer have to

Understanding Kafka Topics and Partitions

雨燕双飞 提交于 2019-11-27 16:38:17
I am starting to learn Kafka for enterprise solution purposes. During my readings, some questions came to my mind: When a producer is producing a message - it will specify the topic it wants to send the message to, is that right? Does it care about partitions? When a subscriber is running - does it specify its group id so that it can be part of a cluster of consumers of the same topic, or several topics that this group of consumers is interested in? Does each consumer group have a corresponding partition on the broker or does each consumer have one? Are the partitions created by the broker,

Limit Kafka batches size when using Spark Streaming

僤鯓⒐⒋嵵緔 提交于 2019-11-27 13:45:02
Is it possible to limit the size of the batches returned by the Kafka consumer for Spark Streaming? I am asking because the first batch I get has hundred of millions of records and it takes ages to process and checkpoint them. I think your problem can be solved by Spark Streaming Backpressure . Check spark.streaming.backpressure.enabled and spark.streaming.backpressure.initialRate . By default spark.streaming.backpressure.initialRate is not set and spark.streaming.backpressure.enabled is disabled by default so I suppose spark will take as much as he can. From Apache Spark Kafka configuration

Difference between session.timeout.ms and max.poll.interval.ms for Kafka 0.10.0.0 and later versions

℡╲_俬逩灬. 提交于 2019-11-27 10:57:28
I am unclear why we need both session.timeout.ms and max.poll.interval.ms and when would we use one or other or both? Seems both indicate the upper bound on the time coordinator will wait to get the heartbeat from consumer before assuming it's dead. Also how does it behave for versions 0.10.1.0+ based on KIP-62 ? Before KIP-62, there is only session.timeout.ms (ie, Kafka 0.10.0 and earlier). max.poll.interval.ms is introduced via KIP-62 (part of Kafka 0.10.1 ). KIP-62, decouples heartbeats from calls to poll() via a background heartbeat thread, allowing for a longer processing time (ie, time

What does “Rebalancing” mean in Apache Kafka context?

时间秒杀一切 提交于 2019-11-27 09:25:49
问题 I am a new user to Kafka and have been trialling it for about 2-3 weeks now. I believe at the moment I have a good understand of how Kafka works for the most part, but after attempting to fit the API for my own Kafka consumer (this is obscure but I'm following the guidelines for the new KafkaConsumer that is supposed to be available for v 0.9, which is out on the 'trunk' repo atm) I've had latency issues consuming from a topic if I have multiple consumers with the same groupID. In this setup,

Delete message after consuming it in KAFKA

拈花ヽ惹草 提交于 2019-11-27 04:03:11
I am using apache kafka to produce and consume a file 5GB in size. I want to know if there is a way where the message from the topic is automatically removed after it is consumed. Do I have any way to keep track of consumed messages? I don't want to delete it manually. In Kafka, the responsibility of what has been consumed is the responsibility of the consumer and this is also one of the main reasons why Kafka has such great horizontal scalability. Using the high level consumer API will automatically do this for you by committing consumed offsets in Zookeeper (or a more recent configuration