I am aware that it is not possible to order multiple partitions in Kafka and that partition ordering is only guaranteed for a single consumer within a group (for a single pa
I'm not using Kafka streams - but it is possible to do this with the normal Consumer.
First sort the partitions - this assumes you've already seeked to the offset in each you want or used Consumer Group to do it.
private List>> orderPartitions(ConsumerRecords events) {
Set pollPartitions = events.partitions();
List>> orderEvents = new ArrayList<>();
for (TopicPartition tp : pollPartitions) {
orderEvents.add(events.records(tp));
}
// order the list by the first event, each list is ordered internally also
orderEvents.sort(new PartitionEventListComparator());
return orderEvents;
}
/**
* Used to sort the topic partition event lists so we get them in order
*/
private class PartitionEventListComparator implements Comparator>> {
@Override
public int compare(List> list1, List> list2) {
long c1 = list1.get(0).timestamp();
long c2 = list2.get(0).timestamp();
if (c1 < c2) {
return -1;
} else if (c1 > c2) {
return 1;
}
return 0;
}
}
Then just round robin the partitions to get the events in order - in practice I've found this to work.
ConsumerRecords events = consumer.poll(500);
int totalEvents = events.count();
log.debug("Polling topic - recieved " + totalEvents + " events");
if (totalEvents == 0) {
break; // no more events
}
List>> orderEvents = orderPartitions(events);
int cnt = 0;
// Each list is removed when it is no longer needed
while (!orderEvents.isEmpty() && sent < max) {
for (int j = 0; j < orderEvents.size(); j++) {
List> subList = orderEvents.get(j);
// The list contains no more events, or none in our time range, remove it
if (subList.size() < cnt + 1) {
orderEvents.remove(j);
log.debug("exhausted partition - removed");
j--;
continue;
}
ConsumerRecord event = subList.get(cnt);
cnt++
}