How to choose the no of partitions for a kafka topic?

瘦欲@ 提交于 2019-12-06 05:43:35

问题


We have 3 zk nodes cluster and 7 brokers. Now we have to create a topic and have to create partitions for this topic.

But I did not find any formula to decide that how much partitions should I create for this topic. Rate of producer is 5k messages/sec and size of each message is 130 Bytes.

Thanks In Advance


回答1:


It depends on your required throughput, cluster size, hardware specifications:

There is a clear blog about this written by Jun Rao from Confluent: How to choose the number of topics/partitions in a Kafka cluster?

Also this might be helpful to have an insight: Apache Kafka Supports 200K Partitions Per Cluster




回答2:


I can't give you a definitive answer, there are many patterns and constraints that can affect the answer, but here are some of the things you might want to take into account:

  • The unit of parallelism is the partition, so if you know the average processing time per message, then you should be able to calculate the number of partitions required to keep up. For example if each message takes 100ms to process and you receive 5k a second then you'll need at least 50 partitions. Add a percentage more that that to cope with peaks and variable infrastructure performance. Queuing Theory can give you the math to calculate your parallelism needs.

  • How bursty is your traffic and what latency constraints do you have? Considering the last point, if you also have latency requirements then you may need to scale out your partitions to cope with your peak rate of traffic.

  • If you use any data locality patterns or require ordering of messages then you need to consider future traffic growth. For example, you deal with customer data and use your customer id as a partition key, and depend on each customer always being routed to the same partition. Perhaps for event sourcing or simply to ensure each change is applied in the right order. Well, if you add new partitions later on to cope with a higher rate of messages, then each customer will likely be routed to a different partition now. This can introduce a few headaches regarding guaranteed message ordering as a customer exists on two partitions. So you want to create enough partitions for future growth. Just remember that is easy to scale out and in consumers, but partitions need some planning, so go on the safe side and be future proof.

  • Having thousands of partitions can increase overall latency.




回答3:


This old benchmark by Kafka co-founder is pretty nice to understand the magnitudes of scale - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

The immediate conclusion from this, like Vanlightly said here above, is that the consumer handling time is the most important factor in deciding on number of partition (since you are not close to challenge the producer throughput).

maximal concurrency for consuming is the number of partitions, so you want to make sure that:

((processing time for one message in seconds x number of msgs per second) / num of partitions) << 1

if it equals to 1, you cannot read faster than writing, and this is without mentioning bursts of messages and failures\downtime of consumers. so you will need to it to be significantly lower than 1, how significant depends on the latency that your system can endure.



来源:https://stackoverflow.com/questions/50271677/how-to-choose-the-no-of-partitions-for-a-kafka-topic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!