Serializing (Queuing) Messages To Aggregates

I responded to a message on Yahoo! Groups DDD forum regarding locking an aggregate root and serializing access to it. The question surrounds locking of an aggregate while it processes a message. The concern was that "pessimistic locking" makes your system unable to scale, which per the formal definition is absolutely correct. But there's a slightly different way to lock that isn't truly pessimistic.

Below follows my response to some of the concerns posted in the above message.

> Well I don't agree with this - at least in our domain. There are many
> commands that could be applied simultaneously to the one Aggregate.
> That's what an IConflictWith interface would be for - for those that
> do actually conflict. For large aggregates in a shared-nothing
> architecture, it just wouldn't scale to only allow one message at a
> time.

From what I understand and have read, IConflictWith is used for conflicts between your eventually consistent reporting store and your aggregates. The idea is that because you're issuing commands built off of eventually consistent (potentially outdated) data you need a way to determine if the incoming command might conflict with other changes that have already occurred within the aggregate. No amount of locking could prevent this when using an eventually consistent model. Yet, we need to ensure each aggregate is always be 100% fully consistent at all times.

Going back to the locking strategy, one of the critical elements in DDDD is that because an aggregate root must be 100% consistent at all times, it can only be actively receiving commands in a single process/address space at a time. In other words, you wouldn't want to have the same aggregate running on multiple machines at the same time and have both actively receiving and processing commands. [Greg has an active/passive aggregate mechanism to handle fast, graceful failover scenarios.]

If an aggregate can only exist in one address space at a time, how do you scale? Don't you create a bottleneck? Yes and no. You scale through partitioning. You distribute your bounded context across multiple machines where Machine A handles aggregates with IDs A-C, Machine B handles aggregates with IDs D-F, etc.

Okay, but we're still locking the aggregate and a single message to the aggregate could take a really long time to process. The answer then is, don't lock for a long time. But what if you have to? If you have some message that's taking a long time to process, the processing for that message probably needs to be broken out into a service. Then, when you need the service, you send a message to it and ask it to perform some work. You then release the lock on the aggregate. When the service completes, it sends a message letting you know the work is done and the status of the work (or whatever you need), you lock the aggregate, process that message and release.

In other words, the lock on the aggregate is always for an extremely short amount of time--never more than a few milliseconds. The aggregate receives the message, makes a quick decision based on any and all the information it already has, and sends a message. If the aggregate needs more information to make a decision, it sends a message asking for more information from a service, the user, etc. You never *ever* lock the aggregate while it's "waiting" for a "response" from the service, doing so would kill your scalability.