Archive for March, 2011

Removing 2PC (Two Phase Commit)

4

I received the following email today and I thought I’d answer it as a blog post so that all can benefit:

If you remove the 2PC from the system, how do you deal with ensuring that published events are:

* truly published,

* not lost,

* that there is confidence that the interested parties (subscribers) are truly receiving and processing the events in a ‘timely’ manner

* and that there is confidence and a method that the system can gracefully recover from unexpected situations?

This is an area that seems to be glossed over quite a bit in the talks and sample code.

Most of the time, the ‘talks’ say, "Throw it in a durable queue, and then it’s there (easy-peasy)" and sample code uses an in-memory synchronous stream of registered methods calls to have a lightweight bus.

How do dev’s handle this in the ‘real-world’?

Perhaps I can ask you this question – how do you normally implement your event processing logic and ensuring that, for example, read models are updated properly even when, for example, db connections go down without using 2PC?

Here is my answer:

In the EventStore, here’s how I handle publishing without 2PC:

  1. When a batch of events are received, they are durably stored to disk with a “Dispatched” flag of false—meaning they haven’t been published.
  2. The batch of events which I call a “commit” is then pushed to a “dispatcher” (like NServiceBus) on a different thread.
  3. The dispatcher publishes all of the events in the commit and commits its own transaction against the queue.
  4. The dispatcher marks the batch of events/commit as dispatched.

If at any point the dispatch fails, the commit is still marked as undispatched and when the system comes back online it will immediately publish those events.  Yes, the introduces a slightly possibility that the message might be published more than once, which is why we need to de-duplicate when handling messages in the read models.

As far as updating a read model without 2PC, here are the general steps:

  1. Receive the event and update your view models accordingly.
  2. As part of the same database transaction, record the unique identifier for the message into some kind of “I’ve handled this message already” table.
  3. Commit the database transaction.

If that message is ever received again, your system will try to insert into the table with all of the message ids, but the transaction will fail because you’ve already handled the message previously. Simply catch this duplicate insert exception and then drop the message.  Or you could be more proactive and find out if the message id is already in the table before you update anything.  But this situation should be very seldom such that I would recommend the more reactive approach.

One question that is too often overlooked is, how long do you keep those identifiers around before remove them.  That question will depend upon the nature of your system and how long the possibility of duplicate messages exist.  You could easily setup an automated/scheduled task to clean out old identifiers that have been in the table for perhaps a few days or even a week or more.

If you’re using a storage engine for your view models that doesn’t support traditional transactions, e.g. a NoSQL store, then you have to be a bit more creative about how you de-duplicate messages, but that’s another topic for another day.

CQRS: Event Sourcing and Immutable Data

5

There are a number of interesting and unique advantages offered by event sourcing as well as messaging in general.  Some of these advantages include the ability to perform merge-based, business-level concurrency—as compared to simple optimistic or pessimistic concurrency.  Further, the ability to replay all stored messages into new or alternate models because business context and intent has been captured, is invaluable.  One interesting advantage that may not seem like much at first is the immutability of the events once they are committed to a persistent medium.

This idea is incredibly powerful because it completely solves with of the primary challenges in computing—concurrency and the need for a single, authoritative source of truth.  Once an event has been accepted and committed, it becomes an established fact—as unalterable as a decree from Pharaoh—and it can be copied everywhere.  The only way to “undo” an event is to add a compensating event on top—like a negative transaction in accounting.

So what does that mean for my application?  How can I take advantage of the immutability of events?  Well, for starters, by following the ideas found below you can completely eliminate almost any chance of the high-profile data loss issues that companies face because you can rebuild your views and reports by replaying all recorded events.

One of the biggest and most obvious ways to take advantage of immutable events is to increase your application’s performance by caching like crazy.  Why not replicate the events into some kind of disk-based in-memory cache across a bunch of nodes?  Because the events never change we can always be assured we’re getting the latest version of a particular event regardless of which cache node we talk to.  In this way, we can read from almost anywhere when we populate our read models or create other reports from our events.

Furthermore, backups become ultra simple—just get all events that have been added since the last backup.  Disaster recovery involves reading data from somewhere else.

From time to time the question comes up, won’t this take up a lot of space?  Yes, it might.  What if I want to clear our some events to make room for new ones?  Can we snapshot and delete/archive events from before the snapshot?  The answer is…why?  Disk space is cheap and data is valuable.  Why not replicate those events to another kind of ultra-cheap, yet highly available storage such as S3 or Azure Blobs?  In this way we have the most immediate and recent events available locally, but all of older events available to us with a few simple queries—but at a slightly higher cost in terms of latency.  How much does 1GB of storage cost on S3?  How many business events can you store in 1 GB of space—especially  when compressed?

If somehow events could change, all of the above advantages would disappear and we would have to keep going back to a single source of truth—a single point of failure—to be sure we had the latest version.  How much of your data is held hostage inside of a legacy database?

If you have a lot of systems that listen to those events, each system could maintain its own copy, thus decreasing the load on your primary or mission-critical systems.

The last advantage is almost unnoticed in the way it compensates for a nefarious and silent killer—media decay or “bit rot”.  Ever had a hard drive slowly and silently corrupt your data?  When media goes bad, it takes your data with it.  But because our data is immutable, it becomes very easy to detect via checksums that the data has been altered by tampering or by media decay.  In our world, this isn’t a problem because we don’t need a single source of truth.  Much like distributed source control, e.g. git, mercurial, etc., any repository can be the authoritative source if we detect problems in our most readily available copy.  Bye bye silent media corruption.

If we wanted we could even write our events to a solid state drive (SSD) so that we can accept writes more quickly.  SSDs generally don’t like lots of writes to the same physical sectors and have tendency to wear out over time as more and more writes occur to the same areas, but our events have a “solid” state, which means that we write them once and then read forevermore.  Thus, the wearing out of an SSD is much less of a problem.

Immutability is a very simple property, but it has profound implications and we can more easily build a truly distributed system.

IIS 7 “500″ Errors

0

I paid my “Windows tithing” recently and did a complete reinstall. Fortunately Windows is now a guest VM inside of a Linux host. A settings change I had made a long time ago but forgot to reapply during my reinstall was for IIS. Whenever I was developing–even locally–I would get “500″ errors from IIS which would then display a generic and very unhelpful error page.

The solution is to go into IIS and disable generic error messages:

http://mvolo.com/blogs/serverside/archive/2007/07/26/Troubleshoot-IIS7-errors-like-a-pro.aspx

New NServiceBus Feature: 32-bit (x86) Host Process

3

NServiceBus is an “Any CPU” framework.  It doesn’t have an 32-bit or 64-bit specific code.  This makes it very easy to transition between 32-bit and 64-bit operating systems.  Unfortunately, not all assemblies are or even can be compiled using the default “Any CPU” architecture. In many, if not most cases, this is related to legacy systems that have 32-bit specific code for platform interop with native C libraries, etc.

If you use the default host–NServiceBus.Host—your application will always load in 64-bit (x64) mode if you’re on a 64-bit OS or in 32-bit (x86) mode for a 32-bit OS.  Again, this is typically not a problem.  But if there are assemblies or other libraries containing 32-bit code that must be invoked and loaded into the process, we’ve got a problem—a BadImageFormatException problem.

I recently pushed a commit to the master branch of NServiceBus on GitHub that compiles two specific versions of the NServiceBus Host.  It compiles the default “Any CPU” version as usual.  But now it also compiles one called NServiceBus.Host32.exe.  This will allow users running a 64-bit OS to run a 32-bit NServiceBus process thus allowing the execution of 32-bit binaries/code without having to resort to workarounds such as corflags.exe which instruct the .NET Framework to run in 32-bit mode.

Installing the VirtualBox Extension Pack on Ubuntu 10.10 x64

0

There have been quite a few posts related to issues installing the VirtualBox Extension Pack for both Windows and Linux hosts.

  • http://forums.virtualbox.org/viewtopic.php?p=11262&sid;=334fb962995ae00d32bb8988192f701c
  • http://www.virtualbox.org/ticket/7899
  • http://www.virtualbox.org/ticket/7972
  • http://blogs.oracle.com/wim/2010/12/oracle_vm_virtualbox_40_extens.html

The error message given is very cryptic:

“Failed to install the Extension Pack” NS_ERROR_FAILURE (0×80004005)

Weird.

In digging through the above posts I found tidbits of the solution that I was able to put together. I’m currently running a Ubuntu 10.10 x64, so here’s how I solved the problem and installed the extension pack. Give the VBoxExtPackHelperApp execute permissions and then run the install from the command line.

  1. sudo chmod 744 /usr/lib/virtualbox/VBoxExtPackHelperApp
  2. sudo /usr/lib/virtualbox/VBoxManage extpack install Oracle_VM_VirtualBox_Extension_Pack-4.0.4-70112.vbox-extpack

In-Memory Messaging (Actors) in C#

1

I just came across a really cool library that facilitates in-memory messaging—very similar to the way this is handled in Scala and Erlang using the Actor model.  It’s called retlang.  Check it out.

CQRS EventStore Podcast

0

For the latest episode of the Distributed Podcast, we looked a little more closely at my CQRS EventStore project.  If you’ve wanted to get some background and understanding on where, when, and why to use the library, now would be a good time.

Go to Top