Monthly Archives: May 2010

Optimistic Concurrency in Riak: Take 2

A few weeks ago I blogged about implementing optimistic concurrency control using the Dynamo clone storage engines.  I need to give an update to this technique that is specific to Riak.

Riak supports an “if-match” header that allows you to implement OCC out of the box.  The only quirk with this technique—per my understanding—is that it is single-node based.  In other words, if I have two concurrent writers that are updating a particular value and they are operating against two different nodes, the “if-match” header doesn’t offer true OCC protection.  With two concurrent writers against two different nodes we will still end up with two versions of the same value.  I need to confirm this 100% with the Basho guys.  Specifically, I need to verify if this still holds true in a quorum setting: W > N / 2.

Supposing the above statement to be true, the easiest way to implement OCC would be to direct writes to a single “master” node.  But that would severely affect our write performance and would effectively simulate a MySQL master/slave configuration, which we are trying to avoid because we want write availability and write performance.

Instead, what if we still adopted the “write to any, read from any” style?  But instead of directing the writes to any random node, we used a deterministic formula to elect a master on a per-request basis, e.g. a modulo on the primary key.  In this way, we could easily scale out the writes through this multi-master topology while still maintaining the ability to read from any node.  Furthermore, we would still realize all of the benefits of optimistic concurrency.

Somewhere Werner is smiling.  [Well, at least he’s not crying.]

Greg Young’s CQRS Workshop

This week I went to Montreal where I was able to participate in a three-day workshop.  There were about nine people there including Greg.

The workshop was great.  I really enjoyed listening to Greg go into significant depth regarding both the technological and business perspectives behind CQRS.  I had never fully considered the business advantages of the architecture, but they are very compelling.

Here are a few pictures from the workshop.  (If you look closely at the first one, you’ll see Greg wrote a goto statement on the board.)

2010-05-27 Jonathan's trip to Montreal 003

2010-05-27 Jonathan's trip to Montreal 004

2010-05-27 Jonathan's trip to Montreal 005   (I’m on the left)

Somewhere Werner is Crying

My last blog post generated a fair amount of activity in the twittersphere.  In many ways I was expecting that because it’s fairly controversial.  To think that someone would take a nice scalable storage solution and put it back under the thumb of an RDBMS is nuts.  But that’s exactly what my previous post talks about.

If it’s so crazy, why blog about it?  I agree, it’s pretty crazy to want to put a Dynamo instance under the thumb of an RDBMS.  Yet, in a narrow set of circumstances there are advantages to doing so.  When we require full consistency for a particular element or set of elements and we happen to be using Dynamo we need a way to prevent multiple instances from stepping on each others toes and performing conflicting operations creating irreconcilable differences.

On example that comes to mind is that of a true, DDD aggregate root.  Aggregate roots by definition are boundaries of consistency.  If we had two instances of the aggregate root loaded simultaneously and they were both allowed to perform conflicting operations concurrently, it would create a lot of headaches to solve that problem.

In the end, the techniques outlined in the aforementioned post are to be considered a small tool in the toolbox rather than a silver bullet to be applied wholesale to an application persistence strategy.

Cassandra/Riak/Dynamo Optimistic Concurrency Control

There are a number of great NoSQL implementation each with their own set of unique advantages and quirks including maturity of the project, size of the community, and technical merit.  While not discounting the unique ways in which each NoSQL project fills a specific set of requirements, I consider one group of NoSQL projects to be a cut above the rest—the Dynamo clones.

While HBase, BigTable, CouchDB, and MongoDB are wonderful and have various levels of acceptance along with installed user base, the Dynamo clones seem to have something else.  They are truly web scale.  I say this because, unlike virtually all other NoSQL implementations, the Dynamo-based projects have no master node.  They are more of a peer-to-peer model with each node acting as a master in its own right.  While CouchDB and some others can act like a cluster of masterless nodes, the Dynamo clones have been architected around this very principle.

Why is this important?  Well, beyond the immediate benefit of no single point of failure and a shared-nothing architecture, the Dynamo clones are design to scale up and down dynamically by simply adding and removing nodes from the cluster.  The “ring” handles everything related to rebalancing or resharding based using mechanisms like gossiping, consistent hashing, hinted handoff, and Merkle trees.  In addition, they each allow tunable or configurable levels of CAP.  It’s not just all CA or AP, but how much of each.

Optimistic Concurrency Problems

While having no master node is one of the strongest and best attributes of Dynamo it is also simultaneously the biggest shortcoming.  The reason is that implementing optimistic concurrency controls against Dynamo is extremely difficult, if not impossible using Dynamo by itself.

Sure, we can get a consistent or quorum read by querying more than half the nodes that store the parituclar data element: “R > N / 2”.  In other words, we can “read our writes”.  The problem is that we can’t reject an update that’s not based upon the latest stored version.  Be design, we can’t prevent multiple writers from writing concurrent and conflicting values.

RDBMS is Dead.  Long Live RDBMS!

When certain elements within our application require strong consistency, we must implement OCC ourselves.  My team kicked this problem around for a bit but we came up with something rather unique and creative.

Why not leverage the locking (and blocking) capabilities of an RDBMS to facilitate OCC within our Dynamo cluster?  If we simply used the RDBMS to implement locking and nothing else, that wouldn’t be very taxing and require lots of memory or disk and should facilitate a significant user load.  Furthermore, the only time any client would experience blocking is when another writer was operating against the same data element within our storage, e.g. the same bank account or same user account or whatever.  In other words, we use the key of the data element as the key for the lock inside our RDBMS.

While in the strictest sense this may feel more like pessimistic locking, let me allay any concerns.  First, we’re only taking a lock on writing.  Every time we attempt to write to our Dynamo cluster, we first acquire an RDBMS lock.  Then we proceed to write using a quorum write (W > N / 2).  Once the write is complete we release the RDBMS lock.  If our process fails or dies the lock is released and other writers working against the same data element are allowed to proceed in serial fashion.

Here is a sample workflow:

  1. Instruction received that requires mutating application state.
  2. Get the key of the data element to be manipulated, e.g. user or account ID.
  3. Open up a DB transaction against our RDBMS using an isolation level of repeatable read.
  4. Insert the key into the RDMBS “locks” table.
  5. Perform a quorum write to the Dynamo cluster.
  6. Rollback the RDBMS transaction to release the lock.

All in all its pretty simple.

Too Much Effort?

The question surrounding this entire implementation is: why even utilize Dynamo at all in this case?  Why not use an RDBMS?  Isn’t this a lot of extra work?

It is true that we are creating some additional overhead and that we are introducing additional latency into the process.  The entire reason behind this approach is that we want to have strict consistency boundaries surrounding a particular kind of element.  While we try to design our application to handle eventual consistency, there are some cases where strict consistency is the easiest and perhaps best approach.

The unique advantage that Dynamo has over a traditional RDBMS, even considering this approach for strict consistency, is scalability.  That’s right, scalability.  While an RDBMS is more rigid in its ability to scale writes that must funnel through a single node (or synchronous cluster), with Dynamo we can easily scale writes by merely adding additional nodes to the ring.  In other words, we can accommodate millions of users in our Dynamo cluster by adding nodes.

One Caveat

Ironically the only difficulty we encounter with this approach other than additional latency is the availability of our lock mechanism itself—the RDBMS.  If client A obtains a lock against DB #1, and DB #1 goes down, we have a chance of client B not being able to obtain a lock or attempting to lock against a different RDBMS, e.g. DB #2.  This means client A is not blocking client B.  Problem.

The easy solution is to have multiple “lock’” servers.  Then, when the application attempts to obtain a lock, it does so against at least N lock servers where N increases based upon your requirement for strict, absolute data consistency.  Each client would attempt to connect to N lock servers in order, e.g. obtain lock against DB #1 and against DB #2.  Only when DB #1 becomes unavailable do we obtain a lock against DB #2 and #3.  Because DB #2 is still available and online, it’s still holding the lock for client A such that client B remains blocked.

Conclusion

This approach may seem like a lot of extra work and overhead but it is actually quite simple while only introducing a small amount of latency into the process.  I have bundled the above principles into a small, open-source project which is call NoSqlMutex.

Maintaining strict consistency is a difficult problem that cannot always be avoided therefore primitives such as the above are necessary for maintaining that consistency against a distributed system.