CQRS: Sagas with Event Sourcing (Part I of II)

For starters, what is a saga? A saga is a "long-lived business transaction or process". Okay, so what does that mean? Well, first of all, the "long-lived" part doesn't have to mean hours, days, or even weeks—it could literally mean something as short as a few seconds. The amount of time is not the important part. It's the fact that the "transaction", business process, or activity spans more than one message. Udi has written several times on sagas. His articles are always worth reading. The foundational theory behind sagas is to avoid the use of blocking (and locking) transactions across lots of resources. Locks are one of the primary enemies of scalability.

Late last week I was communicating with Rinat Abdullin on the CQRS Google Group about sagas and explaining how they might be implemented using event sourcing. I have spoken of this to others in the past as well. In the aforementioned thread, I didn't go into much detail about the "how". Hence this post.

Perhaps we should explain why a saga is even necessary when we have aggregates and domain objects that manage state. Aggregates are great business objects that wrap a lot of business complexity. The problem with aggregates is that they only care about their little part of the universe. Sagas, on the other hand, are coordinating objects. They listen for what's happening and they tell other objects to take appropriate action. Picture them as choreographers—let's try and avoid the words orchestra and orchestration altogether. In essence, sagas listen for events and instruct other parts of the system to perform tasks based upon the events. This is juxtaposed to aggregates which are told to do something and then alert the world that they performed some action. This could be generalized into the following: Sagas listen to events and dispatch commands while aggregates receive commands and publish events.

Sagas manage process. They contain business behavior, but only in the form of process. This is a critical point. Sagas, in their purest form, don't contain business logic. During Greg Young's workshop, he hammered this one pretty hard. Most programmers think "logic" and aren't used to thinking "process". Let me put it this way. If you have a saga with "if else" statements, you've got logic. Process is best implemented using a state machine. The state machine we use to manage process within each saga is called Stateless by Nicholas Blumhardt, the creator of Autofac.

In a message-oriented world, there are two fundamental problems—ordering and duplicates. These are related to message infrastructure and its corresponding guarantees. Sagas are able to act as a kind of firewall to the outside. They encapsulate the mess and help shield the domain from much of the nasty details of reality.

With message ordering, let's consider how a typical shipping or fulfillment department might utilize sagas to help carry out their responsibilities. Let's imagine that fulfillment is not to ship the goods until payment has been received. As messages can arrive in any order, what would happen if "PaymentRecieved" was the first message to arrive? In that situation, fulfillment wouldn't even have a clue what to ship because they haven't even been made aware of a corresponding order. With sagas we can easily outline the various state transitions and then dispatch commands only when the appropriate conditions are met—such as the receipt of both the OrderReceived and PaymentReceived events, regardless of the sequence in which they arrive.

Looking at duplicates and idempotency with a state machine-based saga, it's not hard to see that regardless of the number of times a message is received, it will only cause a state transition once. Think about it like a DVD player. You hit stop and then stop again. Not matter how many times you press stop, the state transition only occurs once. So it is with sagas. A message can be received multiple times but will only cause the desired state transition once. This makes duplicate messages naturally idempotent. Sweet.

The other really great thing about sagas is that they understand the concept of time. For example, let's suppose we wanted to notify first-time customers of a discount if they didn't complete checkout within 48 hours. The saga could implement a timeout by scheduling a timeout message to be sent back to the saga after 48 hours. When the timeout message was received 48 hours later, the saga would attempt to perform the state transition related to the timeout. If the customer did something in the last 48 hours, the timeout message wouldn't cause a state transition. But if the person had not yet completed checkout, the saga could dispatch a message to the appropriate component to take corresponding action. This component would decide the "who" and the "how". All the saga did was to tell the component: "There hasn't been any activity, notify the customer." The component would then decide whether or not to notify based upon business logic, e.g. is this a first-time customer, etc.

To Be Continued…

Now that we've explained why sagas are important and how state machines can help us implement a saga which deals with the complexity of duplicate messages, out-of-order messages, and timeout messages, in Part II of this article, we'll look at some of the details of how to use event sourcing to create testable, robust, yet fluid sagas.

Menu