I am using apache kafka for messaging. I have implemented the producer and consumer in Java. How can we get the number of messages in a topic?
问题:
回答1:
The only way that comes to mind for this from a consumer point of view is to actually consume the messages and count them then.
The Kafka broker exposes JMX counters for number of messages received since start-up but you cannot know how many of them have been purged already.
In most common scenarios, messages in Kafka is best seen as an infinite stream and getting a discrete value of how many that is currently being kept on disk is not relevant. Furthermore things get more complicated when dealing with a cluster of brokers which all have a subset of the messages in a topic.
回答2:
It is not java, but may be useful
./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list : --topic --time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'
回答3:
I actually use this for benchmarking my POC. The item you want to use ConsumerOffsetChecker. You can run it using bash script like below.
bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --zookeeper localhost:2181 --group testgroup
And below is the result : As you can see on the red box, 999 is the number of message currently in the topic.
Update: ConsumerOffsetChecker is deprecated since 0.10.0, you may want to start using ConsumerGroupCommand.
回答4:
Use https://prestodb.io/docs/current/connector/kafka-tutorial.html
A super SQL engine, provided by Facebook, that connects on several data sources (Cassandra, Kafka, JMX, Redis ...).
PrestoDB is running as a server with optional workers (there is a standalone mode without extra workers), then you use a small executable JAR (called presto CLI) to make queries.
Once you have configured well the Presto server , you can use traditionnal SQL:
SELECT count(*) FROM TOPIC_NAME;
回答5:
To get all the messages stored for the topic you can seek the consumer to the beginning and end of the stream for each partition and sum the results
List partitions = consumer.partitionsFor(topic).stream() .map(p -> new TopicPartition(topic, p.partition())) .collect(Collectors.toList()); consumer.assign(partitions); consumer.seekToEnd(Collections.emptySet()); Map endPartitions = partitions.stream() .collect(Collectors.toMap(Function.identity(), consumer::position)); consumer.seekToBeginning(Collections.emptySet()); System.out.println(partitions.stream().mapToLong(p -> endPartitions.get(p) - consumer.position(p)).sum());
回答6:
Apache Kafka command to get un handled messages on all partitions of a topic:
kafka-run-class kafka.tools.ConsumerOffsetChecker --topic test --zookeeper localhost:2181 --group test_group
Prints:
Group Topic Pid Offset logSize Lag Owner test_group test 0 11051 11053 2 none test_group test 1 10810 10812 2 none test_group test 2 11027 11028 1 none
Column 6 is the un-handled messages. Add them up like this:
kafka-run-class kafka.tools.ConsumerOffsetChecker --topic test --zookeeper localhost:2181 --group test_group 2>/dev/null | awk 'NR>1 {sum += $6} END {print sum}'
awk reads the rows, skips the header line and adds up the 6th column and at the end prints the sum.
Prints
5
回答7:
I haven't tried this myself, but it seems to make sense.
You can also use kafka.tools.ConsumerOffsetChecker
(source).
回答8:
回答9:
Using the Java client of Kafka 2.11-1.0.0, you can do the following thing :
KafkaConsumer consumer = new KafkaConsumer(props); consumer.subscribe(Collections.singletonList("test")); while(true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); // after each message, query the number of messages of the topic Set partitions = consumer.assignment(); Map offsets = consumer.endOffsets(partitions); for(TopicPartition partition : offsets.keySet()) { System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition)); } } }
Output is something like this :
offset = 10, key = null, value = un partition test is at 13 offset = 11, key = null, value = deux partition test is at 13 offset = 12, key = null, value = trois partition test is at 13