Spring Kafka - Consume last N messages for partitions(s) for any topic

别来无恙 提交于 2019-12-03 03:47:31

Why is there such a need, I don't understand. Kafka itself manages when there is nothing in the queue. If messages jump from state-to-state, one can have separate queues/topics. However, here's how one can do it.

When we consume messages from a partition using something like -

ConsumerIterator<byte[], byte[]> it = something; //initialize consumer
while (it.hasNext()) {
  MessageAndMetadata<byte[], byte[]> messageAndMetadata = it.next();
  String kafkaMessage = new String(messageAndMetadata.message());
  int partition = messageAndMetadata.partition();
  long offset = messageAndMetadata.offset();
  boolean processed = false;
  do{
    long maxOffset = something; //fetch from db
    //if offset<maxOffset, then process messages and manual commit
    //else busy wait or something more useful
  }while(processed);
}

We get information about about the offsets, partition number and the message itself. You can choose to do anything with this info.

For your use-case, you might also decide to persist the consumed offsets into a database so that the next time, offsets can be adjusted. Also, I would recommend shutdown hookup for cleanup and a final saving the processed offsets to DB.

So if I understand you correctly, this should be doable with a standard Kafka Consumer.

Consumer<?, Message> consumer = ...

public Map<Integer, List<Message>> readLatestFromPartitions(String topic, Collection<Integer> partitions, int count) {

    // create the TopicPartitions we want to read
    List<TopicPartition> tps = partitions.stream().map(p -> new TopicPartition(topic, p)).collect(toList());
    consumer.assign(tps);

    // create and initialize the result map
    Map<Integer, List<Message>> result = new HashMap<>();
    for (Integer i : partitions) { result.add(new ArrayList<>()); }

    // read until the expected count has been read for all partitions
    while (result.valueSet().stream().findAny(l -> l.size() < count)) {
        // read until the end of the topic
        ConsumerRecords<?, Message> records = consumer.poll(Duration.ofSeconds(5));
        while (records.count() > 0) {
            Iterator<ConsumerRecord<?, Message>> recordIterator = records.iterator();
            while (recordIterator.hasNext()) {
                ConsumerRecord<?, Message> record = recordIterator.next();
                List<Message> addTo = result.get(record.partition);
                // only allow 10 entries per partition
                if (addTo.size() >= count) {
                    addTo.remove(0);
                }
                addTo.add(record.value);
            }
            records = consumer.poll(Duration.ofSeconds(5));
        }
        // now we have read the whole topic for the given partitions.
        // if all lists contain the expected count, the loop will finish;
        // otherwise it will wait for more data to arrive.
    }

    // the map now contains the messages in the order they were sent,
    // we want them reversed (LIFO)
    Map<Integer, List<Message>> returnValue = new HashMap<>();
    result.forEach((k, v) -> returnValue.put(k, Collections.reverse(v)));
    return returnValue;
}

This can be achieve through stateStore in Kafka Stream. Which can be used by stream processing applications to store and query data. The Kafka Streams DSL, for example, automatically creates and manages such state stores when you are calling stateful operators such as count() or aggregate(), or when you are windowing a stream. This state store can be stored into RocksDB databse, an in-memory hash map so some other data structure. You can store RocksDB in some where persistent storage e.g. portworx to handle fault scenario.

A Kafka Streams application is typically running on many application instances. Because Kafka Streams partitions the data for processing it, an application's entire state is spread across the local state stores of the application's running instances. The Kafka Streams API lets you work with an application's state stores both locally (e.g., on the level of an instance of the application) as well as in its entirety (on the level of the "logical" application), for example through stateful operations such as count() or through Interactive Queries.

Below show how you initialize StateStore

StoreBuilder<KeyValueStore<String, String>> statStore = Stores
                .keyValueStoreBuilder(Stores.persistentKeyValueStore("uniqueName"), Serdes.String(),
                        Serdes.String())
                .withLoggingDisabled(); // disable backing up the store to a change log topic

Below show how to add state store inside Kafka Stream

Topology builder = new Topology();
        builder.addSource("Source", topic)
                .addProcessor("SourceProcessName", () -> new ProcessorClass(), "Source")
                .addStateStore(statStore, "SourceProcessName")
                .addSink("SinkProcessName", sinkTopic, "SourceProcessName");

In Process Method You can store Kafka topic message as key,value

KeyValueStore<String, String> dsStore = (KeyValueStore<String, String>) context.getStateStore("statStore");
KeyValueIterator<String, String> iter = this.dsStore.all();
while (iter.hasNext()) {
                    KeyValue<String, String> entry = iter.next();
}

--------------------------Updated------------------------

In case to store start offset and endOffset into process we need to do slight change on Processor.

int lastNRecord=10;//Assume
        int startOffsetIndex=Get from seperate consumer e.g. Map<TopicPartition, Long> offsets = consumer.beginningOffsets();
//Pass these information to Kafka Stream
builder.addSource("Source", topic)
                .addProcessor("ProcessWaferMapWaiting", () -> new ProcessorClass(lastNRecord, startOffsetIndex, "Source")
                .addStateStore(countStoreSupplier, "ProcessWaferMapWaiting")
                .addSink("SinkWaferMapWaiting", sinkTopic, "ProcessWaferMapWaiting");

In Processor we need to track stored offset for each ket value so what i am thinking we can store Key as offset and value you could combined both key and value its totally optional what exactly you need for manipulation..If just Message value sufficient we can ignore message key.

In that case processor could be like below.

public class ProcessorClass implements Processor<String, String> {


        private int startOffsetIndex=0;
        private Long endOffsetIndex=0;

        private ProcessorContext context;
        private KeyValueStore<Long, String> dsStore;
        private long intervalMs = 600000;
        private long waitMsEachAsCall=100;
        private int lastNRecord=10;//Default

        //Get the startOffset from consumer and pass to the process
        public ProcessorClass(int lastNRecord,int startOffsetIndex) {
            this.lastNRecord=lastNRecord;
            this.startOffsetIndex=startOffsetIndex;
        }

        @Override
        @SuppressWarnings("unchecked")
        public void init(ProcessorContext context) {
            this.context = context;
            endOffsetIndex=context.offset();

            dsStore = (KeyValueStore<Long, String>) context.getStateStore("statStore");
            this.context.schedule(intervalMs, PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
                KeyValueIterator<Long, String> iter = this.dsStore.all();

                while (iter.hasNext()) {

                    KeyValue<Long, String> entry = iter.next();
                  //Itertae and check match key matched startOffsetIndex if yes
                    //For loop till lastNRecord

                    try {
                        //Sleep for some time before next AS call
                        Thread.sleep(waitMsEachAsCall);

                    } catch (InterruptedException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }


                }

                if (iter != null)
                    iter.close();

                context.commit();
            });

        }

        @Override
        public void process(String key, String value) {

            if (key != null) {
                dsStore.put(endOffsetIndex, key+"|"+value);
                logger.info("Adding key on state store:" + endOffsetIndex+","+key+","+value);
            }

        }

        @Override
        public void close() {
            // nothing to do
        }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!