Cassandra/Riak/Dynamo Optimistic Concurrency Control

There are a number of great NoSQL implementation each with their own set of unique advantages and quirks including maturity of the project, size of the community, and technical merit.  While not discounting the unique ways in which each NoSQL project fills a specific set of requirements, I consider one group of NoSQL projects to be a cut above the rest—the Dynamo clones.

While HBase, BigTable, CouchDB, and MongoDB are wonderful and have various levels of acceptance along with installed user base, the Dynamo clones seem to have something else.  They are truly web scale.  I say this because, unlike virtually all other NoSQL implementations, the Dynamo-based projects have no master node.  They are more of a peer-to-peer model with each node acting as a master in its own right.  While CouchDB and some others can act like a cluster of masterless nodes, the Dynamo clones have been architected around this very principle.

Why is this important?  Well, beyond the immediate benefit of no single point of failure and a shared-nothing architecture, the Dynamo clones are design to scale up and down dynamically by simply adding and removing nodes from the cluster.  The “ring” handles everything related to rebalancing or resharding based using mechanisms like gossiping, consistent hashing, hinted handoff, and Merkle trees.  In addition, they each allow tunable or configurable levels of CAP.  It’s not just all CA or AP, but how much of each.

Optimistic Concurrency Problems

While having no master node is one of the strongest and best attributes of Dynamo it is also simultaneously the biggest shortcoming.  The reason is that implementing optimistic concurrency controls against Dynamo is extremely difficult, if not impossible using Dynamo by itself.

Sure, we can get a consistent or quorum read by querying more than half the nodes that store the parituclar data element: “R > N / 2”.  In other words, we can “read our writes”.  The problem is that we can’t reject an update that’s not based upon the latest stored version.  Be design, we can’t prevent multiple writers from writing concurrent and conflicting values.

RDBMS is Dead.  Long Live RDBMS!

When certain elements within our application require strong consistency, we must implement OCC ourselves.  My team kicked this problem around for a bit but we came up with something rather unique and creative.

Why not leverage the locking (and blocking) capabilities of an RDBMS to facilitate OCC within our Dynamo cluster?  If we simply used the RDBMS to implement locking and nothing else, that wouldn’t be very taxing and require lots of memory or disk and should facilitate a significant user load.  Furthermore, the only time any client would experience blocking is when another writer was operating against the same data element within our storage, e.g. the same bank account or same user account or whatever.  In other words, we use the key of the data element as the key for the lock inside our RDBMS.

While in the strictest sense this may feel more like pessimistic locking, let me allay any concerns.  First, we’re only taking a lock on writing.  Every time we attempt to write to our Dynamo cluster, we first acquire an RDBMS lock.  Then we proceed to write using a quorum write (W > N / 2).  Once the write is complete we release the RDBMS lock.  If our process fails or dies the lock is released and other writers working against the same data element are allowed to proceed in serial fashion.

Here is a sample workflow:

  1. Instruction received that requires mutating application state.
  2. Get the key of the data element to be manipulated, e.g. user or account ID.
  3. Open up a DB transaction against our RDBMS using an isolation level of repeatable read.
  4. Insert the key into the RDMBS “locks” table.
  5. Perform a quorum write to the Dynamo cluster.
  6. Rollback the RDBMS transaction to release the lock.

All in all its pretty simple.

Too Much Effort?

The question surrounding this entire implementation is: why even utilize Dynamo at all in this case?  Why not use an RDBMS?  Isn’t this a lot of extra work?

It is true that we are creating some additional overhead and that we are introducing additional latency into the process.  The entire reason behind this approach is that we want to have strict consistency boundaries surrounding a particular kind of element.  While we try to design our application to handle eventual consistency, there are some cases where strict consistency is the easiest and perhaps best approach.

The unique advantage that Dynamo has over a traditional RDBMS, even considering this approach for strict consistency, is scalability.  That’s right, scalability.  While an RDBMS is more rigid in its ability to scale writes that must funnel through a single node (or synchronous cluster), with Dynamo we can easily scale writes by merely adding additional nodes to the ring.  In other words, we can accommodate millions of users in our Dynamo cluster by adding nodes.

One Caveat

Ironically the only difficulty we encounter with this approach other than additional latency is the availability of our lock mechanism itself—the RDBMS.  If client A obtains a lock against DB #1, and DB #1 goes down, we have a chance of client B not being able to obtain a lock or attempting to lock against a different RDBMS, e.g. DB #2.  This means client A is not blocking client B.  Problem.

The easy solution is to have multiple “lock'” servers.  Then, when the application attempts to obtain a lock, it does so against at least N lock servers where N increases based upon your requirement for strict, absolute data consistency.  Each client would attempt to connect to N lock servers in order, e.g. obtain lock against DB #1 and against DB #2.  Only when DB #1 becomes unavailable do we obtain a lock against DB #2 and #3.  Because DB #2 is still available and online, it’s still holding the lock for client A such that client B remains blocked.

Conclusion

This approach may seem like a lot of extra work and overhead but it is actually quite simple while only introducing a small amount of latency into the process.  I have bundled the above principles into a small, open-source project which is call NoSqlMutex.

Maintaining strict consistency is a difficult problem that cannot always be avoided therefore primitives such as the above are necessary for maintaining that consistency against a distributed system.


comments powered by Disqus
comments powered by Disqus