How to choose a Key and Offset for a Kafka Producer

▼魔方 西西 提交于 2020-01-24 20:37:06

问题


I'm following here.While following the code. I came up with two Questions

  1. Is the Key and offset were the same?

According to Google,

Offset: A Kafka topic receives messages across a distributed set of partitions where they are stored. Each partition maintains the messages it has received in a sequential order where they are identified by an offset, also known as a position.

Seems both are very similar for me. Since offset maintain a unique message in the partition: Producers send records to a partition based on the record’s key

  1. What is the best way to choose the Key/Offset for a producer?

For an instance the example which I provided above, they have chosen the timestamp as the Key and offset. Is this the always the best recommendation?

 class IRCMessageListener extends IRCEventAdapter {
    @Override
    public void onPrivmsg(String channel, IRCUser u, String msg) {
        IRCMessage event = new IRCMessage(channel, u, msg);
        //FIXME kafka round robin default partitioner seems to always publish to partition 0 only (?)
        long ts = event.getInt64("timestamp");
        Map<String, ?> srcOffset = Collections.singletonMap(TIMESTAMP_FIELD, ts);
        Map<String, ?> srcPartition = Collections.singletonMap(CHANNEL_FIELD, channel);
        SourceRecord record = new SourceRecord(srcPartition, srcOffset, topic, KEY_SCHEMA, ts, IRCMessage.SCHEMA, event);
        queue.offer(record);
    }

Because I'm actually trying to create a custom Kafka connector to get the data from 3rd Party WebSocket API. The API sends real-time data stream messages for a given Key value. So I thought of using that Key for my PartitionKey as well as Offset. But need to make sure I'm right about my thought.


回答1:


Key is an optional metadata, that can be sent with a Kafka message, and by default, it is used to route message to a specific partition. E.g. if you're sending a message m with key as k, to a topic mytopic that has p partitions, then m goes to the partition Hash(k) % p in mytopic. It has no connection to the offset of a partition whatsoever. Offsets are used by consumers to keep track of the position of last read message in a partition. In your case, if the timestamp is fairly randomly distributed, then it's fine, else you might be causing partition imbalance while using it as key.




回答2:


These are some basic differences :

Offset : maintained by kafka to keep a track of the records consumed to avoid loss of records and duplicate records while consuming.

Key : it is specific to input events,if it is not available then by default it is mentioned as null,this is useful while writing records to HDFS with default partition-er using kafka connect.every message can have a single key or many messages can have similar key.



来源:https://stackoverflow.com/questions/51245962/how-to-choose-a-key-and-offset-for-a-kafka-producer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!