Rebuild queries from domain events by multiple aggregates

问题

I'm using a DDD/CQRS/ES approach and I have some questions about modeling my aggregate(s) and queries. As an example consider the following scenario:

A User can create a WorkItem, change its title and associate other users to it. A WorkItem has participants (associated users) and a participant can add Actions to a WorkItem. Participants can execute Actions.

Let's just assume that Users are already created and I only need userIds.

I have the following WorkItem commands:

CreateWorkItem
ChangeTitle
AddParticipant
AddAction
ExecuteAction

These commands must be idempotent, so I cant add twice the same user or action.

And the following query:

WorkItemDetails (all info for a work item)

Queries are updated by handlers that handle domain events raised by WorkItem aggregate(s) (after they're persisted in the EventStore). All these events contain the WorkItemId. I would like to be able to rebuild the queries on the fly, if needed, by loading all the relevant events and processing them in sequence. This is because my users usually won't access WorkItems created one year ago, so I don't need to have these queries processed. So when I fetch a query that doesn't exist, I could rebuild it and store it in a key/value store with a TTL.

Domain events have an aggregateId (used as the event streamId and shard key) and a sequenceId (used as the eventId within an event stream).

So my first attempt was to create a large Aggregate called WorkItem that had a collection of participants and a collection of actions. Participant and Actions are entities that live only within a WorkItem. A participant references a userId and an action references a participantId. They can have more information, but it's not relevant for this exercise. With this solution my large WorkItem aggregate can ensure that the commands are idempotent because I can validate that I don't add duplicate participants or actions, and if I want to rebuild the WorkItemDetails query, I just load/process all the events for a given WorkItemId.

This works fine because since I only have one aggregate, the WorkItemId can be the aggregateId, so when I rebuild the query I just load all events for a given WorkItemId. However, this solution has the performance issues of a large Aggregate (why load all participants and actions to process a ChangeTitle command?).

So my next attempt is to have different aggregates, all with the same WorkItemId as a property but only the WorkItem aggregate has it as an aggregateId. This fixes the performance issues, I can update the query because all events contain the WorkItemId but now my problem is that I can't rebuild it from scratch because I don't know the aggregateIds for the other aggregates, so I can't load their event streams and process them. They have a WorkItemId property but that's not their real aggregateId. Also I can't guarantee that I process events sequentially, because each aggregate will have its own event stream, but I'm not sure if that's a real problem.

Another solution I can think of is to have a dedicated event stream to consolidate all WorkItem events raised by the multiple aggregates. So I could have event handlers that simply append the events fired by the Participant and Actions to an event stream whose id would be something like "{workItemId}:allevents". This would be used only to rebuild the WorkItemDetails query. This sounds like an hack.. basically I'm creating an "aggregate" that has no business operations.

What other solutions do I have? Is it uncommon to rebuild queries on the fly? Can it be done when events for multiple aggregates (multiple event streams) are used to build the same query? I've searched for this scenario and haven't found anything useful. I feel like I'm missing something that should be very obvious, but I haven't figured what.

Any help on this is very much appreciated.

Thanks

回答1:

I don't think you should design your aggregates with querying concerns in mind. The Read side is here for that.

On the domain side, focus on consistency concerns (how small can the aggregate be and the domain still remain consistent in a single transaction), concurrency (how big can it be and not suffer concurrent access problems / race conditions ?) and performance (would we load thousands of objects in memory just to perform a simple command ? -- exactly what you were asking).

I don't see anything wrong with on-demand read models. It's basically the same as reading from a live stream, except you re-create the stream when you need it. However, this might be quite a lot of work for not an extraordinary gain, because most of the time, entities are queried just after they are modified. If on-demand becomes "basically every time the entity changes", you might as well subscribe to live changes. As for "old" views, the definition of "old" is that they are not modified any more, so they don't need to be recalculated anyways, regardless of if you have an on-demand or continuous system.

If you go the multiple small aggregates route and your Read Model needs information from several sources to update itself, you have a couple of options :

Enrich emitted events with additional data
Read from multiple event streams and consolidate their data to build the read model. No magic here, the Read side needs to know which aggregates are involved in a particular projection. You could also query other Read Models if you know they are up-to-date and will give you just the data you need.

See CQRS events do not contain details needed for updating read model

来源：https://stackoverflow.com/questions/27325304/rebuild-queries-from-domain-events-by-multiple-aggregates

标签

domain-driven-design

cqrs

event-sourcing