CQRS: Event Sourcing and Immutable Data

There are a number of interesting and unique advantages offered by event sourcing as well as messaging in general. Some of these advantages include the ability to perform merge-based, business-level concurrency—as compared to simple optimistic or pessimistic concurrency. Further, the ability to replay all stored messages into new or alternate models because business context and intent has been captured, is invaluable. One interesting advantage that may not seem like much at first is the immutability of the events once they are committed to a persistent medium.

This idea is incredibly powerful because it completely solves with of the primary challenges in computing—concurrency and the need for a single, authoritative source of truth. Once an event has been accepted and committed, it becomes an established fact—as unalterable as a decree from Pharaoh—and it can be copied everywhere. The only way to "undo" an event is to add a compensating event on top—like a negative transaction in accounting.

So what does that mean for my application? How can I take advantage of the immutability of events? Well, for starters, by following the ideas found below you can completely eliminate almost any chance of the high-profile data loss issues that companies face because you can rebuild your views and reports by replaying all recorded events.

One of the biggest and most obvious ways to take advantage of immutable events is to increase your application's performance by caching like crazy. Why not replicate the events into some kind of disk-based in-memory cache across a bunch of nodes? Because the events never change we can always be assured we're getting the latest version of a particular event regardless of which cache node we talk to. In this way, we can read from almost anywhere when we populate our read models or create other reports from our events.

Furthermore, backups become ultra simple—just get all events that have been added since the last backup. Disaster recovery involves reading data from somewhere else.

From time to time the question comes up, won't this take up a lot of space? Yes, it might. What if I want to clear our some events to make room for new ones? Can we snapshot and delete/archive events from before the snapshot? The answer is…why? Disk space is cheap and data is valuable. Why not replicate those events to another kind of ultra-cheap, yet highly available storage such as S3 or Azure Blobs? In this way we have the most immediate and recent events available locally, but all of older events available to us with a few simple queries—but at a slightly higher cost in terms of latency. How much does 1GB of storage cost on S3? How many business events can you store in 1 GB of space—especially when compressed?

If somehow events could change, all of the above advantages would disappear and we would have to keep going back to a single source of truth—a single point of failure—to be sure we had the latest version. How much of your data is held hostage inside of a legacy database?

If you have a lot of systems that listen to those events, each system could maintain its own copy, thus decreasing the load on your primary or mission-critical systems.

The last advantage is almost unnoticed in the way it compensates for a nefarious and silent killer—media decay or "bit rot". Ever had a hard drive slowly and silently corrupt your data? When media goes bad, it takes your data with it. But because our data is immutable, it becomes very easy to detect via checksums that the data has been altered by tampering or by media decay. In our world, this isn't a problem because we don't need a single source of truth. Much like distributed source control, e.g. git, mercurial, etc., any repository can be the authoritative source if we detect problems in our most readily available copy. Bye bye silent media corruption.

If we wanted we could even write our events to a solid state drive (SSD) so that we can accept writes more quickly. SSDs generally don't like lots of writes to the same physical sectors and have tendency to wear out over time as more and more writes occur to the same areas, but our events have a "solid" state, which means that we write them once and then read forevermore. Thus, the wearing out of an SSD is much less of a problem.

Immutability is a very simple property, but it has profound implications and we can more easily build a truly distributed system.