How does data denormalization work with the Microservice Pattern?

前端未结

关注

 4  506

后悔当初 2021-01-29 19:16

I just read an article on Microservices and PaaS Architecture. In that article, about a third of the way down, the author states (under Denormalize like Crazy):

4条回答

死守一世寂寞 (楼主)

2021-01-29 19:32

I realise this is possibly not a good answer but what the heck. Your question was:

Given a database that consists entirely of related tables, how does one denormalize this into smaller fragments (groups of tables)

WRT the database design I'd say "you can't without removing foreign keys".

That is, people pushing Microservices with the strict no shared DB rule are asking database designers to give up foreign keys (and they are doing that implicitly or explicitly). When they don't explicitly state the loss of FK's it makes you wonder if they actually know and recognise the value of foreign keys (because it is frequently not mentioned at all).

I have seen big systems broken into groups of tables. In these cases there can be either A) no FK's allowed between the groups or B) one special group that holds "core" tables that can be referenced by FK's to tables in other groups.

... but in these systems "groups of tables" is often 50+ tables so not small enough for strict compliance with microservices.

To me the other related issue to consider with the Microservice approach to splitting the DB is the impact this has reporting, the question of how all the data is brought together for reporting and/or loading into a data warehouse.

Somewhat related is also the tendency to ignore built in DB replication features in favor of messaging (and how DB based replication of the core tables / DDD shared kernel) impacts the design.

EDIT: (the cost of JOIN via REST calls)

When we split up the DB as suggested by microservices and remove FK's we not only lose the enforced declarative business rule (of the FK) but we also lose the ability for the DB to perform the join(s) across those boundaries.

In OLTP FK values are generally not "UX Friendly" and we often want to join on them.

In the example if we fetch the last 100 orders we probably don't want to show the customer id values in the UX. Instead we need to make a second call to customer to get their name. However, if we also wanted the order lines we also need to make another call to the products service to show product name, sku etc rather than product id.

In general we can find that when we break up the DB design in this way we need to do a lot of "JOIN via REST" calls. So what is the relative cost of doing this?

Actual Story: Example costs for 'JOIN via REST' vs DB Joins

There are 4 microservices and they involve a lot of "JOIN via REST". A benchmark load for these 4 services comes to ~15 minutes. Those 4 microservices converted into 1 service with 4 modules against a shared DB (that allows joins) executes the same load in ~20 seconds.

This unfortunately is not a direct apples to apples comparison for DB joins vs "JOIN via REST" as in this case we also changed from a NoSQL DB to Postgres.

Is it a surprise that "JOIN via REST" performs relatively poorly when compared to a DB that has a cost based optimiser etc.

To some extent when we break up the DB like this we are also walking away from the 'cost based optimiser' and all that in does with query execution planning for us in favor of writing our own join logic (we are somewhat writing our own relatively unsophisticated query execution plan).

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...