CQRS: Out of Sequence Messages and Read Models

One of the tricks to message-based solutions is handling messages that come out of order. In some instances the application code must be made resilient to this situation. In other cases, it can be easier to re-sequence the messages and then apply each messages in sequence. In one of my current systems this is the route we took—re-sequencing messages if they arrive out of order.

One of the easier possibilities is to have the read model poll the event store for new commits. I dislike this solution because it doesn't leverage the innate capabilities of push-based notifications and feels like a resource drain. Nonetheless, it's probably one of the easier ones to handle until you start dealing with scaling out the read model. At that point, a different approach must be taken.

The solution that we use is to dequeue a message and to place it in a "holding table" until all messages with a previous sequence are received. When all previous messages have been received we take all messages out of the holding table and run them in sequence through the appropriate handlers. Once all handlers have been executed successfully, we remove the messages from the holding table and commit the updates to the read models.

This works for us because the domain publishes events and marks them with the appropriate sequence number. Without this, the solution below would be much more difficult—if not impossible.

This solution is using a relational database as a persistence storage mechanism, but we're not using any of the relational aspects of the storage engine. At the same time, there's a caveat in all of this. If message 2, 3, and 4 arrive but message 1 never does, we don't apply any of them. The scenario should only happen if there's an error processing message 1 or if message 1 somehow gets lost. Fortunately, it's easy enough to correct any errors in our message handlers and re-run the messages. Or, in the case of a lost message, to re-build the read models from the event store directly.

One interesting and very positive side effect that we discovered when we implemented this solution was full message idempotency. Basically we have a small "Sequence" table that gives us the most recent min/max message sequence received for a given aggregate. Only when we have received a full sequence do we run the messages through the handlers. If a message arrives more than once and it have already been handled, the sequence table's "min" value would be greater than the current message's "sequence" and we would know to drop the message.

The other caveat that we have yet to address is how to apply this pattern to a NoSQL solution. It definitely would involve some additional overhead that we don't need at this point. The solution was quite straightforward to implement using pure SQL and would have been much more difficult using something like NHibernate.