Effective strategy to avoid duplicate messages in apache kafka consumer

前端 未结 5 1554
借酒劲吻你
借酒劲吻你 2020-12-22 23:39

I have been studying apache kafka for a month now. I am however, stuck at a point now. My use case is, I have two or more consumer processes running on different machines. I

5条回答
  •  無奈伤痛
    2020-12-23 00:13

    I agree with RaGe's deduplicate on the consumer side. And we use Redis to deduplicate Kafka message.

    Assume the Message class has a member called 'uniqId', which is filled by the producer side and is guaranteed to be unique. We use a 12 length random string. (regexp is '^[A-Za-z0-9]{12}$')

    The consumer side use Redis's SETNX to deduplicate and EXPIRE to purge expired keys automatically. Sample code:

    Message msg = ... // eg. ConsumerIterator.next().message().fromJson();
    Jedis jedis = ... // eg. JedisPool.getResource();
    String key = "SPOUT:" + msg.uniqId; // prefix name at will
    String val = Long.toString(System.currentTimeMillis());
    long rsps = jedis.setnx(key, val);
    if (rsps <= 0) {
        log.warn("kafka dup: {}", msg.toJson()); // and other logic
    } else {
        jedis.expire(key, 7200); // 2 hours is ok for production environment;
    }
    

    The above code did detect duplicate messages several times when Kafka(version 0.8.x) had situations. With our input/output balance audit log, no message lost or dup happened.

提交回复
热议问题