Effective strategy to avoid duplicate messages in apache kafka consumer

前端未结

关注

 5  1565

借酒劲吻你 2020-12-22 23:39

I have been studying apache kafka for a month now. I am however, stuck at a point now. My use case is, I have two or more consumer processes running on different machines. I

5条回答

無奈伤痛 (楼主)

2020-12-23 00:13
I agree with RaGe's deduplicate on the consumer side. And we use Redis to deduplicate Kafka message.

Assume the Message class has a member called 'uniqId', which is filled by the producer side and is guaranteed to be unique. We use a 12 length random string. (regexp is '^[A-Za-z0-9]{12}$')

The consumer side use Redis's SETNX to deduplicate and EXPIRE to purge expired keys automatically. Sample code:
```
Message msg = ... // eg. ConsumerIterator.next().message().fromJson();
Jedis jedis = ... // eg. JedisPool.getResource();
String key = "SPOUT:" + msg.uniqId; // prefix name at will
String val = Long.toString(System.currentTimeMillis());
long rsps = jedis.setnx(key, val);
if (rsps <= 0) {
    log.warn("kafka dup: {}", msg.toJson()); // and other logic
} else {
    jedis.expire(key, 7200); // 2 hours is ok for production environment;
}
```
The above code did detect duplicate messages several times when Kafka(version 0.8.x) had situations. With our input/output balance audit log, no message lost or dup happened.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...