kafka-consumer-api

Python kafka consumer group id issue

半世苍凉 提交于 2019-12-04 21:05:26
AFAIK, The concept of partitions and (consumer) groups in kafka was introduced to implement parallelism. I am working with kafka through python. I have a certain topic, which has (say) 2 partitions. This means, if I start a consumer group with 2 consumers in it, they will be mapped(subscribed) to different partitions. But, using kafka library in python, I came across a weird issue. I started 2 consumers with essentially the same group-ids, and started the threads for them to consume messages. But, every message in the kafka-stream is being consumed by both of them !! This seems ridiculous to

How to pause a kafka consumer?

≯℡__Kan透↙ 提交于 2019-12-04 19:26:01
I am using a Kafka producer - consumer model in my framework. The record consumed at the consumer end is later indexed onto the elasticsearch. Here i have a use case where if the ES is down, I will have to pause the kafka consumer until the ES is up, Once it is up, I need to resume the consumer and consume the record from where I last left. I don't think this can be achieved with @KafkaListener. Can anyone please give me a solution for this? I figured out that I need to write my own KafkaListenerContainer for this, but I am not able to implement it correctly. Any help would be much appreciated

Not clear about the meaning of auto.offset.reset and enable.auto.commit in Kafka

江枫思渺然 提交于 2019-12-04 18:47:03
问题 I am new to Kafka,and I don't really understand the meaning of Kafka configuration, can anyone explain more understandable to me ! Here is my code: val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "master:9092,slave1:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "GROUP_2017", "auto.offset.reset" -> "latest", //earliest or latest "enable.auto.commit" -> (true: java.lang.Boolean) ) what does it mean in

Apache Kafka: Exactly Once in Version 0.10

风流意气都作罢 提交于 2019-12-04 17:03:42
To achieve exactly-once processing of messages by Kafka consumer I am committing one message at a time, like below public void commitOneRecordConsumer(long seconds) { KafkaConsumer<String, String> consumer = consumerConfigFactory.getConsumerConfig(); try { while (running) { ConsumerRecords<String, String> records = consumer.poll(1000); try { for (ConsumerRecord<String, String> record : records) { processingService.process(record); consumer.commitSync(Collections.singletonMap(new TopicPartition(record.topic(),record.partition()), new OffsetAndMetadata(record.offset() + 1))); System.out.println(

kafka-python consumer start reading from offset (automatically)

倖福魔咒の 提交于 2019-12-04 14:21:31
问题 I'm trying to build an application with kafka-python where a consumer reads data from a range of topics. It is extremely important that the consumer never reads the same message twice, but also never misses a message. Everything seems to be working fine, except when I turn off the consumer (e.g. failure) and try to start reading from offset. I can only read all the messages from the topic (which creates double reads) or listen for new messages only (and miss messages that where emitted during

How to choose the no of partitions for a kafka topic?

五迷三道 提交于 2019-12-04 10:48:43
We have 3 zk nodes cluster and 7 brokers. Now we have to create a topic and have to create partitions for this topic. But I did not find any formula to decide that how much partitions should I create for this topic. Rate of producer is 5k messages/sec and size of each message is 130 Bytes. Thanks In Advance It depends on your required throughput, cluster size, hardware specifications: There is a clear blog about this written by Jun Rao from Confluent: How to choose the number of topics/partitions in a Kafka cluster? Also this might be helpful to have an insight: Apache Kafka Supports 200K

kafka-python - How do I commit a partition?

ε祈祈猫儿з 提交于 2019-12-04 10:46:45
Using kafka-python-1.0.2. If I have a topic with 10 partitions, how do I go about committing a particular partition, while looping through the various partitions and messages. I just cant seem find an example of this anywhere, in the docs or otherwise From the docs, I want to use: consumer.commit(offset=offsets) Specifically, how do I create the partition and OffsetAndMetadata dictionary required for offsets (dict, optional) – {TopicPartition: OffsetAndMetadata}. I was hoping the function call would just be something like: consumer.commit(partition, offset) but this does not seem to be the

How to handle error and don't commit when use Kafka Streams DSL

喜你入骨 提交于 2019-12-04 10:38:31
问题 For Kafka Streams, if we use lower-level processor API, we can control to commit or not. So if problems happens in our code, and we don't want to commit this message. In this case, Kafka will redeliver this message multiple times until the problem gets fixed. But how to control whether commit the message when use its higher-level stream DSL API? Resources: http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html 回答1: Your statement is not completely true. You cannot "control to

how to set group name when consuming messages in kafka using command line

。_饼干妹妹 提交于 2019-12-04 10:08:33
问题 Any idea how to set group name when consuming messages in kafka using command line. I tried with the following command : bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic nil_RF2_P2 --from-beginning --config group.id=test1 'config' is not a recognized option The goal is to find the offset of consumed messages with the following command: bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper localhost:2181 --group test1 Can somebody help in this regards!! Thanks in

Kafka : How to connect kafka-console-consumer to fetch remote broker topic content?

匆匆过客 提交于 2019-12-04 08:27:30
I have setup a kafka zookeeper and 3 brokers on one machine on ec2 with ports 9092..9094 and am trying to consume the topic content from another machine. The ports 2181 (zk), 9092, 9093 and 9094 (servers) are open to the consumer machine. I can even do a bin/kafka-topics.sh --describe --zookeeper 172.X.X.X:2181 --topic remotetopic which gives me Topic:remotetopic PartitionCount:1 ReplicationFactor:3 Configs: Topic: remotetopic Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Blockquote However when i do bin/kafka-console-consumer.sh --zookeeper 172.X.X.X:2181 --from-beginning --topic