Concurrency in a DDDD World

Introduction

One of the tough things about DDDD right now is there isn't a mountain of information available like with other programming topics. I guess that's a blessing in disguise because there's also not a ton of misinformation either.

In traditional N-tier systems and in repository-based, DDD systems, we typically let the database manage all of our state for us. Then we use different patterns, such as the optimistic offline lock (AKA optimistic concurrency), to ensure concurrency when multiple processes are trying to modify the same information at roughly the same time.

In DDDD things work a little bit differently. Once you start to scale a bounded context out beyond one server, you have to worry about concurrency—meaning that we ensure that the understanding of the known world is consistent between any instance of the same bounded context. This is a very challenging problem, but it is one that has been solved in several different ways. What we need to decide is how best to solve the problem in a distributed, message-based environment.

The Identity Map

As I have blogged about previously, Greg mentions that each bounded context has an identity map and that we can use pub/sub to notify all other instances of the same bounded context of changes in state as a result of processing commands. This works great, but once we have multiple instances running, we need to be extremely careful lest two instances of the same aggregate process commands which produce conflicting results.

An Example

Let's suppose there's a user on eBay who wants to sign-up with a certain username—let's say it's "bob" and, for the sake of argument, that it's available. Because of the number of users using eBay, it's a fairly probable situation that he could attempt to register with that username at the same time as someone else.

If the same aggregate was on multiple servers and a request came into both instances simultaneously requesting the username "bob"—one from Bob Martin and the other from Bob Jones—all of a sudden, we have a problem. Each aggregate has approved the username "bob" and passed a message back to some workflow saying that it's reserved the name for their respective user. We've got a concurrency issue.

If not address properly we will run into some very nasty temporal bugs that may or may not be discovered for a long time. What's worse, the longer the bug exists, the more damage it causes to all related systems. Usually these types of bugs have a way of corrupting data not just in their own bounded context, but throughout all parts of the extended domain.

The Solution

Rather than randomly having any instance of the same bounded context receive commands that it needs to perform, we instead want to forward them to a single instance. You might say, "Wait a second! Why run multiple instances if all commands are forwarded to only one instance?!" The answer is simple: Partitioning. We can use a locator (devTeach @ 46:20) to forward the command to the appropriate instance of the bounded context based upon some selection criteria. In the case of our eBay example—we could partition based upon the first letter of the username, such that all requests for usernames starting with the letter "b" are forwarded to a specific instance of the bounded context for the aggregate root to process.

Continuing with our example—the same aggregate would now service both requests. It would service one command and then the other. The first would be approved, the second would be denied. Usernames starting with other letters, would be handled by other instances. Because we have different instances handling non-overlapping portions of work, we have effectively eliminated concurrency issues—for this particular context. Do you see how we can distribute and scale?

The only caveat, per my understanding, is that we need to ensure that each aggregate only performs one unit of work at a time—effectively serializing all work. Serializing (queuing up work), makes it seem like commands would back up very quickly. Fortunately because of ways that we can architecture our DDDD infrastructure, each unit of work does not take long to complete—especially when we don't have to communicate with a persistence engine directly and virtually all communication is in-process communication.