Data Consistency Across Microservices

点点圈 提交于 2019-12-02 16:56:48

The Microservices architectural style tries to allow organizations to have small teams own services independent in development and at runtime. See this read. And the hardest part is to define the service boundaries in a useful way. When you discover that the way you split up your application results in requirements impacting multiple services frequently that would tell you to rethink the service boundaries. The same is true for when you feel a strong need to share entities between the services.

So the general advice would be to try very hard to avoid such scenarios. However there may be cases where you cannot avoid this. Since a good architecture is often about making the right trade-offs, here some ideas.

  1. Consider expressing the dependency using service interfaces (API) instead of a direct DB dependency. That would allow each service team to change their internal data schema as much as required and only worry about the interface design when it comes to dependencies. This is helpful because it is easier to add additional APIs and slowly deprecate older APIs instead of changing a DB design along with all dependent Microservices (potentially at the same time). In other words you are still able to deploy new Microservice versions independently, as long as the old APIs are still supported. This is the approach recommended by the Amazon CTO, who was pioneering a lot of the Microservices approach. Here is a recommended read of an interview in 2006 with him.

  2. Whenever you really really cannot avoid using the same DBs and you are splitting your service boundaries in a way that multiple teams/services require the same entities, you introduce two dependencies between the Microservice team and the team that is responsible for the data scheme: a) Data Format, b) Actual Data. This is not impossible to solve, but only with some overhead in organization. And if you introduce too many of such dependencies your organization will likely be crippled and slowed down in development.

a) Dependency on the data scheme. The entities data format cannot be modified without requiring changes in the Microservices. To decouple this you will have to version the entities data scheme strictly and in the database support all versions of the data that the Microservices are currently using. This would allow the Microservices teams to decide for themselves when to update their service to support the new version of the data scheme. This is not feasible with all use cases, but it works with many.

b) Dependency on the actual collected data. The data that has been collected and is of a known version for a Microservice is OK to use, but the issue occurs when you have some services producing a newer version of the data and another service depends on it - But was not yet upgraded to being able to read the latest version. This problem is hard to solve and in many cases suggests you did not chose the service boundaries correctly. Typically you have no choice but to roll out all services that depend on the data at the same time as upgrading the data in the database. A more wacky approach is to write different versions of the data concurrently (which works mostly when the data is not mutable).

To solve both a) and b) in some other cases the dependency can be reduced by hidden data duplication and eventual consistency. Meaning each service stores its own version of the data and only modifies it whenever the requirements for that service change. The services can do so by listening to a public data flow. In such scenarios you would be using an event based architecture where you define a set of public events that can be queued up and consumed by listeners from the different services that will process the event and store whatever data out of it that is relevant for it (potentially creating data duplication). Now some other events may indicate that internally stored data has to be updated and it is each services responsibility to do so with its own copy of the data. A technology to maintain such a public event queue is Kafka.

Theoretical Limitations

One important caveat to remember is the CAP theorem:

In the presence of a partition, one is then left with two options: consistency or availability. When choosing consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.

So by "requiring" that certain entities are consistent across multiple services you increase the probability that you will have to deal with timeout issues.

Akka Distributed Data

Akka has a distributed data module to share information within a cluster:

All data entries are spread to all nodes, or nodes with a certain role, in the cluster via direct replication and gossip based dissemination. You have fine grained control of the consistency level for reads and writes.

Same problem here. We have data in different microservices and there are cases where one service needs to know if there is a specific entity in another microservice. We do not want the services to call each others to complete a request because this adds response time and multipies downtimes. Also it adds a nightmare of coupling depth. The client should not decide about business logic and data validation/consistency either. We also do not want central services like "Saga Controllers" to provide for consistency between services.

So we use a Kafka message bus to inform observing services of state changes in "upstream" services. We try very hard not to miss or ignore any messages even in error conditions and we use Martin Fowler's "tolerant reader" pattern to couple as loosely as possible. Still sometimes services are changed and after the change they might need information from other services that they might have emitted on the bus before but they are now gone (even Kafka cannot store forever).

We decided for now that each Service be split in a pure and decoupled web service (RESTful) that does the actual work and a separate Connector-Service which listens to the Bus and may also call other services. This Connector runs in the background. It is only triggered by bus messages. It then will try to add data to the main service via REST calls. If the service responds with a consistency error, the connector will try to repair this by fetching the needed data from the upstream service and inject it as needed. (We cannot afford batch-jobs to "synchronize" data en block, so we just fetch what we need). If there are better ideas, we are always open, but "pull" or "just change data model" is not what we consider feasible...

I think there are 2 main forces at play here:

  • decoupling - that's why you have microservices in the first place and want a shared-nothing approach to data persistence
  • consistency requirement - if I understood correctly you're already fine with eventual consistency

The diagram makes perfect sense to me, but I don't know of any framework to do it out of the box, probably due to the many use-case specific trade-offs involved. I'd approach the problem as follows:

The upstream service emits events on to the message bus, as you've shown. For the purpose of serialisation I'd carefully choose the wire format that doesn't couple the producer and consumer too much. The ones I know of are protobuf and avro. You can evolve your event model upstream without having to change the downstream if it doesn't care about the newly added fields and can do a rolling upgrade if it does.

The downstream services subscribe to the events - the message bus must provide fault-tolerance. We're using kafka for this but since you chose AMQP I'm assuming it gives you what you need.

In case of network failures (e.g. the downstream consumer cannot connect to the broker) if you favour (eventual) consistency over availability you may choose to refuse to serve requests that rely on data that you know can be more stale than some preconfigured threshold.

I think you can approach this issue from 2 angles, service collaboration and data modelling:

Service collaboration

Here you can choose between service orchestration and service choreography. You already mentioned the exchange of messages or events between services. This would be the choreography approach which as you said might work but involves writing code in each service that deals with the messaging part. I'm sure there are libraries for that though. Or you can choose service orchestration where you introduce a new composite service - the orchestrator, which can be responsible for managing the data updates between the services. Because the data consistency management is now extracted into a separate component, this would allow you to switch between eventual consistency and strong consistency without touching the downstream services.

Data modelling

You can also choose to redesign the data models behind the participating microservices and to extract the entities that are required to be consistent across multiple services into relationships managed by a dedicated relationship microservice. Such a microservice would be somewhat similar to the orchestrator but the coupling would be reduced because the relationships can be modelled in a generic way.

"accordingly update the linked entities in their respective databases" -> data duplication -> FAIL.

Using events to update other databases is identical to caching which brings cache consistency problem which is the problem you arise in your question.

Keep your local databases as separated as possible and use pull semantics instead of push, i.e. make RPC calls when you need some data and be prepared to gracefully handle possible errors like timeouts, missing data or service unavailability. Akka or Finagle gives enough tools to do that right.

This approach might hurt performance but at least you can choose what to trade and where. Possible ways to decrease latency and increase throughput are:

  • scale data provider services so they can handle more req/sec with lower latency
  • use local caches with short expiration time. That will introduce eventual consistency but really helps with performance.
  • use distributed cache and face cache consistency problem directly
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!