In a previous post, I talked about a possible relational database schema for event storage. I also discussed the idea behind aggregate versioning in a relational store. In this post, I'd like to talk about storage in a non-relational persistence medium, such as CouchDB, HyperTable, or even the file system.
One of the reasons behind the "last_snapshot_seq" column in the relational schema is that is helps determine the point at which to start loading events from the database. Because, by definition, a relational set is unordered, we needed a way to retrieve all events that have occurred since (and including) the last snapshot event in one fell swoop.
Using a non-relational model changes things slightly. We can start to treat the events stored for an aggregate as a collection accessible by an index. Using this mechanism, we start evaluating the events one by one beginning with the one that has most-recently occurred. For example, as we get the events, if the most recent event is not a snapshot message, we put it onto a temporary stack. We then read the next event. If that event is not a snapshot message, we push it on the stack on top of the previously retrieved message. We continue and read the next message. If that event is a snapshot, we run it through the aggregate, pop the stack and run the next event through the aggregate root and so forth. This has the effect of bringing our aggregate to the last known state.
As you can see, we're working with events one at a time from the persistence engine. Because of this, we need to know the sequence number of the most recent event to be processed by the aggregate, which allows us to know the point at which to start loading events. This is where the storage for the aggregate version comes into play. In Greg's devTeach video (at 39:45), he mentions persisting the version to the persistence mechanism.
The reason that it's important to store this value is so that we know the sequence number of the event at which to start retrieving the event messages from the store.
Granted the event store could be implemented any number of ways. In thinking about it, you could still potentially store the last snapshot message sequence and load events forward (using a queue) or you could read from the end and move to the beginning of the storage (using a stack). The important thing is that you know the point at which to begin loading an event, that they are loaded in order, AND that you know the version number of the aggregate for the next message when you're pushing events to the persistence engine.