Design Kafka consumers and producers for scalability

Deadly 提交于 2021-02-07 10:50:30

问题


I want to design a solution for sending different kinds of e-mails to several providers. The general overview.

I have several upstream providers Sendgrid, Zoho, Mailgun and etc. They will be used to send e-mails and etc. For example:

  • E-mail for Register new user
  • E-mail for Remove user
  • E-mail for Space Quota limit

(in general around 6 types of e-mails)

Every type of e-mail should be generated into Producers, converted into Serialized Java Object and Send to the appropriate Kafka Consumer integrated with the Upstream provider.

The questions is how to design Kafka for maximum performance and scalability?

  • 1-st solution so far that I can think if is to have topic for every type of e-mail message and every gateway(6x4 = 24 topics). In the future I'm expecting to add more types of messages and gateways. Maybe it will reach 600 topics. This will make a lot Java source code for maintenance and a lot of topics to manage. Another downside will be that Kafka logs will be huge.

  • 2-nd solution will be to use 1 topic for each consumer(integration gateway). But in this case how I can send every type different serialized Java object based on the type of message that I want to send?

Is there some better way to design this setup so that it allow me to scale it much more easy and make it very robust for future integrations?

You can see here how I send message between consumers and producers: org.apache.kafka.common.KafkaException: class SaleRequestFactory is not an instance of org.apache.kafka.common.serialization.Serializer

EDIT:

  1. Order matters because the communication will be asyncronius. Producers will wait for returned messages for status
  2. It's not important to keep the data of each gateway on a different topic
  3. What kind of isolation do you want? I want ot isolate the messages/topics completely from each other in order to prevent mistakes in future when I need to add more gateways or types of messages

is it important to you to keep the data of each gateway on a different topic? - no, I just want ot isolate hte data.

If you would go with a single topic per gateway, do you care about the overhead it will make on the client-side? - read unnecessary messages, write more logic, hybrid serializer, etc

I have no idea here. My main consern is to make the system easy to extent with new features.


回答1:


Well, unfortunately, there is no easy answer here.
You would need to ask yourself a few questions and choose from a few tradeoffs -

First, does order matters? is it just E-mails that you want to forward from point A to point B?, or do you want to keep (I guess you would) a reasonable order of events to the same entity (e.g - a mail about user creation need to be received before mail about the same new user who changed his password.)

If order matters, it's better to use the same topic with a partitioning key as Kafka has guarantees on the ordering of the messages only at the partition level.

What kind of isolation do you want? is it important to you to keep the data of each gateway on a different topic?
If you would go with a single topic per gateway, do you care about the overhead it will make on the client-side? - read unnecessary messages, write more logic, hybrid serializer, etc

Can you estimate on which dimensions would you scale? - if you would go with the first solution, topic per gateway & event type, and suddenly you will need to add 100X of gateways, it won't necessarily be the right call. Moreover, what will happen if you would need to process the User-Change-Emails faster? - more partitions lead to higher throughput - would you be able to do so?


Confluent has few great articles about those subjects that might help you -

Should You Put Several Event Types in the Same Kafka Topic?

How to choose the number of topics/partitions in a Kafka cluster?




回答2:


I think that one topic per event-type would indeed be too much for the operational overhead you mentioned.

Option 2 I think would be the right way - one topic per integration-gateway, with dedicated consumers. The advantages are:

  • You isolate the workload on the topic-level (many messages on integration-gateway A will not impact the consumers for the gateway B)
  • You can scale the consumers based on the topic workload

The producers will serialize the message according to the requirements of the gateway, and they will publish it on the specific topic. The consumers will just read the messages and push it.



来源:https://stackoverflow.com/questions/65811681/design-kafka-consumers-and-producers-for-scalability

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!