kafka-consumer-api | 易学教程

How to make restart-able producer?

阅读更多关于 How to make restart-able producer?

Latest version of kafka support exactly-once-semantics (EoS). To support this notion, extra details are added to each message. This means that at your consumer; if you print offsets of messages they won't be necessarily sequential. This makes harder to poll a topic to read the last committed message. In my case, consumer printed something like this Offset-0 0 Offset-2 1 Offset-4 2 Problem: In order to write restart-able proudcer; I poll the topic and read the content of last message. In this case; last message would be offset#5 which is not a valid consumer record. Hence, I see errors in my

Kafka Avro Consumer with Decoder issues

阅读更多关于 Kafka Avro Consumer with Decoder issues

When I attempted to run Kafka Consumer with Avro over the data with my respective schema,it returns an error of "AvroRuntimeException: Malformed data. Length is negative: -40" . I see others have had similar issues converting byte array to json , Avro write and read , and Kafka Avro Binary *coder . I have also referenced this Consumer Group Example , which have all been helpful, however no help with this error thus far.. It works up until this part of code (line 73) Decoder decoder = DecoderFactory.get().binaryDecoder(byteArrayInputStream, null); I have tried other decoders and printed out the

KafkaConsumer 0.10 Java API error message: No current assignment for partition

阅读更多关于 KafkaConsumer 0.10 Java API error message: No current assignment for partition

I am using KafkaConsumer 0.10 Java api. I want to consume from a specific partition and specific offset. I looked up and found that there is a seek method but its throwing an exception. Anyone had a similar use case or solution ? Code: KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(consumerProps); consumer.seek(new TopicPartition("mytopic", 1), 4); Exception java.lang.IllegalStateException: No current assignment for partition mytopic-1 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251) at org.apache.kafka.clients.consumer

How to read data using Kafka Consumer API from beginning?

阅读更多关于 How to read data using Kafka Consumer API from beginning?

Please can anyone tell me how to read messages using the Kafka Consumer API from the beginning every time when I run the consumer jar. This works with the 0.9.x consumer. Basically when you create a consumer, you need to assign a consumer group id to this consumer using the property ConsumerConfig.GROUP_ID_CONFIG . Generate the consumer group id randomly every time you start the consumer doing something like this properties.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); (properties is an instance of java.util.Properties that you will pass to the constructor new

Can a Kafka consumer(0.8.2.2) read messages in batch

阅读更多关于 Can a Kafka consumer(0.8.2.2) read messages in batch

问题 As per my understanding Kafka consumer reads messages from an assigned partition sequentially... We are planning to have multiple Kafka consumer (Java) which has same group I'd ..so if it reads sequentially from an assigned partition then how we can achieve high throughput ..i.e. For Example Producer publishes messages like 40 per sec ... Consumer process msg 1 per sec ..though we can have multiple consumers but cannot have 40 rt??? Correct me if I'm wrong... And in our case consumer have to

Understanding Kafka Topics and Partitions

阅读更多关于 Understanding Kafka Topics and Partitions

I am starting to learn Kafka for enterprise solution purposes. During my readings, some questions came to my mind: When a producer is producing a message - it will specify the topic it wants to send the message to, is that right? Does it care about partitions? When a subscriber is running - does it specify its group id so that it can be part of a cluster of consumers of the same topic, or several topics that this group of consumers is interested in? Does each consumer group have a corresponding partition on the broker or does each consumer have one? Are the partitions created by the broker,

Limit Kafka batches size when using Spark Streaming

阅读更多关于 Limit Kafka batches size when using Spark Streaming

Is it possible to limit the size of the batches returned by the Kafka consumer for Spark Streaming? I am asking because the first batch I get has hundred of millions of records and it takes ages to process and checkpoint them. I think your problem can be solved by Spark Streaming Backpressure . Check spark.streaming.backpressure.enabled and spark.streaming.backpressure.initialRate . By default spark.streaming.backpressure.initialRate is not set and spark.streaming.backpressure.enabled is disabled by default so I suppose spark will take as much as he can. From Apache Spark Kafka configuration

Difference between session.timeout.ms and max.poll.interval.ms for Kafka 0.10.0.0 and later versions

阅读更多关于 Difference between session.timeout.ms and max.poll.interval.ms for Kafka 0.10.0.0 and later versions

I am unclear why we need both session.timeout.ms and max.poll.interval.ms and when would we use one or other or both? Seems both indicate the upper bound on the time coordinator will wait to get the heartbeat from consumer before assuming it's dead. Also how does it behave for versions 0.10.1.0+ based on KIP-62 ? Before KIP-62, there is only session.timeout.ms (ie, Kafka 0.10.0 and earlier). max.poll.interval.ms is introduced via KIP-62 (part of Kafka 0.10.1 ). KIP-62, decouples heartbeats from calls to poll() via a background heartbeat thread, allowing for a longer processing time (ie, time

What does “Rebalancing” mean in Apache Kafka context?

阅读更多关于 What does “Rebalancing” mean in Apache Kafka context?

问题 I am a new user to Kafka and have been trialling it for about 2-3 weeks now. I believe at the moment I have a good understand of how Kafka works for the most part, but after attempting to fit the API for my own Kafka consumer (this is obscure but I'm following the guidelines for the new KafkaConsumer that is supposed to be available for v 0.9, which is out on the 'trunk' repo atm) I've had latency issues consuming from a topic if I have multiple consumers with the same groupID. In this setup,

Delete message after consuming it in KAFKA

阅读更多关于 Delete message after consuming it in KAFKA

I am using apache kafka to produce and consume a file 5GB in size. I want to know if there is a way where the message from the topic is automatically removed after it is consumed. Do I have any way to keep track of consumed messages? I don't want to delete it manually. In Kafka, the responsibility of what has been consumed is the responsibility of the consumer and this is also one of the main reasons why Kafka has such great horizontal scalability. Using the high level consumer API will automatically do this for you by committing consumed offsets in Zookeeper (or a more recent configuration