Infinite Loop Event Sourcing

Greg Young talks about using an identity map to help avoid hitting the persistence engine and making it the bottleneck for the entire system. In a fully distributed scenario, the event pipeline coming out of the repository would publish any events committed to an aggregate to all interested parties. These interested parties would include a persistence engine subscriber, other bounded contexts, and other instances of the same bounded context. Other instances of the same bounded context would use an identity map and listen to messages for aggregates that already held in memory--other messages would be ignored.

When using an identity map there are several things to keep in mind:

  1. The identity map sits between the repository and the persistence engine.
  2. The identity map must only hand out transactionally consistent aggregate roots.
  3. Applying events within the identity map must not cause events to be published.

Do I Need To Sit Between You Two?

The identity map sits "between" the repository and the persistence engine so that it can service requests for a particular aggregate root. If it's able to service the request, because it has one in memory, it hands the in-memory instance back. If not, the repository goes to the persistence engine and load the values and stores the reconstituted aggregate root the identity map.

This step alone takes a huge burden off of the persistence engine.

Transactionally Consistent

Normally an identity map is unique to a unit of work such that each unit of work would have its own identity map holding non-shared reference to an aggregate. While this statement is true, it's not entirely complete when dealing with distributed-messaging scenarios.

Specifically, one of the things that the message-based identity map does is listen as a subscriber to all events published by other processes or instances of the same bounded context. If an event comes in and the aggregate to which the event pertains is in memory, it processes the event against the aggregate. If the aggregate is not in memory, it simply drops the message.

We have to be very careful to ensure that when a series of events comes in as part of a single transaction, that we apply all events before handing back a reference to the aggregate. If we don't, we risk passing back the aggregate in an inconsistent--or even invalid--state.

Furthermore, we have to be careful not to apply messages to the aggregate root reference that is currently being processed as part of the unit of work. It would seem as though this message-based identity map is two layers deep. There would be a session-level identity map which holds session-level references and a process-level identity map which subscribes to messages from other instances of the same bounded context and applies them when they arrive.

For optimistic concurrency, it would appear that when a unit of work was complete, the aggregate version at the start of the unit of work would be compared with the aggregate version in the process-level identity map. If the two were the same, the unit of work would commit, otherwise, the transaction would rollback and the unit of work would be re-attempted.

Lastly, it would also appear that the process-level identity map doesn't hand out the same reference to specific aggregate root twice. Instead it would appear that it hands out copies. The main reason for this is so that multiple units of work could potentially be executing against the same aggregate root simultaneously, with optimistic concurrency protecting each instance from the other. The other reason is so that, if a unit of work failed, we wouldn't have to figure out how to rollback the aggregate. Instead we could just toss out the reference and get a new one.

Infinite Loop

Lastly, when applying events received to the aggregate from other instances of the same bounded context, we want to ensure that we ignore the events that come out the other side. This means that, by definition, when I send an aggregate a "UserNameChanged" event, the aggregate produces a "UserNameChanged" event as a result in a 1-1 consume/produce fashion.

We don't want to publish these events, otherwise, we might inadvertently create an infinite loop between all the other instances of the same bounded context.