Keeping services in sync in a kafka event driven backbone

问题

Say I am using Kafka as the event-driven backbone for all my microservices in my system design. Many microservices use the events data to populate their internal databases. Now there is a requirement where I need to create a new service and it uses some events data. The service will only be able to consume events after the time it comes live and hence, won't have a lot of data that it missed. I want a strategy such that I don't have to backfill my internal databases by writing out scripts.

What are some cool strategies I can have which do not create a huge load on Kafka & does not account for a lot of scripting to backfill data in the new services that I ever create?

回答1:

There are a few strategies you can have here, depending on how you publish data to a kafka topic. Here are a few ideas:

first, you can set the retention of a kafka topic to be forever, meaning that it will store all the data. This is OK as kafka is built for this purpose as well. See this. By doing this, any new service that come alive can start consuming data from the start.
if you are using kafka for latest state publishing for a given entity/aggregate, you can also consider configuring the topic to be a compacted. This will let you store at least the latest state of your entity/aggregate on the topic, and new consumers that starts listening on the topic will have less data to configure. However, your consumers still need to know how to process multiple messages per entity/aggregate as you cannot guarantee it will have exactly one message in the topic.

来源：https://stackoverflow.com/questions/61702954/keeping-services-in-sync-in-a-kafka-event-driven-backbone

标签

apache-kafka

architecture

microservices

system-design

event-driven-design