Java, How to get number of messages in a topic in apache kafka

匿名 (未验证) 提交于 2019-12-03 02:56:01

问题:

I am using apache kafka for messaging. I have implemented the producer and consumer in Java. How can we get the number of messages in a topic?

回答1:

The only way that comes to mind for this from a consumer point of view is to actually consume the messages and count them then.

The Kafka broker exposes JMX counters for number of messages received since start-up but you cannot know how many of them have been purged already.

In most common scenarios, messages in Kafka is best seen as an infinite stream and getting a discrete value of how many that is currently being kept on disk is not relevant. Furthermore things get more complicated when dealing with a cluster of brokers which all have a subset of the messages in a topic.



回答2:

It is not java, but may be useful

./bin/kafka-run-class.sh kafka.tools.GetOffsetShell    --broker-list :      --topic  --time -1 --offsets 1    | awk -F  ":" '{sum += $3} END {print sum}' 


回答3:

I actually use this for benchmarking my POC. The item you want to use ConsumerOffsetChecker. You can run it using bash script like below.

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker  --topic test --zookeeper localhost:2181 --group testgroup 

And below is the result :

As you can see on the red box, 999 is the number of message currently in the topic.

Update: ConsumerOffsetChecker is deprecated since 0.10.0, you may want to start using ConsumerGroupCommand.



回答4:

Use https://prestodb.io/docs/current/connector/kafka-tutorial.html

A super SQL engine, provided by Facebook, that connects on several data sources (Cassandra, Kafka, JMX, Redis ...).

PrestoDB is running as a server with optional workers (there is a standalone mode without extra workers), then you use a small executable JAR (called presto CLI) to make queries.

Once you have configured well the Presto server , you can use traditionnal SQL:

SELECT count(*) FROM TOPIC_NAME; 


回答5:

To get all the messages stored for the topic you can seek the consumer to the beginning and end of the stream for each partition and sum the results

List partitions = consumer.partitionsFor(topic).stream()         .map(p -> new TopicPartition(topic, p.partition()))         .collect(Collectors.toList());     consumer.assign(partitions);      consumer.seekToEnd(Collections.emptySet()); Map endPartitions = partitions.stream()         .collect(Collectors.toMap(Function.identity(), consumer::position));     consumer.seekToBeginning(Collections.emptySet()); System.out.println(partitions.stream().mapToLong(p -> endPartitions.get(p) - consumer.position(p)).sum()); 


回答6:

Apache Kafka command to get un handled messages on all partitions of a topic:

kafka-run-class kafka.tools.ConsumerOffsetChecker      --topic test --zookeeper localhost:2181      --group test_group 

Prints:

Group      Topic        Pid Offset          logSize         Lag             Owner test_group test         0   11051           11053           2               none test_group test         1   10810           10812           2               none test_group test         2   11027           11028           1               none 

Column 6 is the un-handled messages. Add them up like this:

kafka-run-class kafka.tools.ConsumerOffsetChecker      --topic test --zookeeper localhost:2181      --group test_group 2>/dev/null | awk 'NR>1 {sum += $6}      END {print sum}' 

awk reads the rows, skips the header line and adds up the 6th column and at the end prints the sum.

Prints

5 


回答7:

I haven't tried this myself, but it seems to make sense.

You can also use kafka.tools.ConsumerOffsetChecker (source).



回答8:

In most recent versions of Kafka Manager, there is a column titled Summed Recent Offsets.



回答9:

Using the Java client of Kafka 2.11-1.0.0, you can do the following thing :

    KafkaConsumer consumer = new KafkaConsumer(props);     consumer.subscribe(Collections.singletonList("test"));     while(true) {         ConsumerRecords records = consumer.poll(100);         for (ConsumerRecord record : records) {             System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());              // after each message, query the number of messages of the topic             Set partitions = consumer.assignment();             Map offsets = consumer.endOffsets(partitions);             for(TopicPartition partition : offsets.keySet()) {                 System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition));             }         }     } 

Output is something like this :

offset = 10, key = null, value = un partition test is at 13 offset = 11, key = null, value = deux partition test is at 13 offset = 12, key = null, value = trois partition test is at 13 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!