My Beef with MSDTC and Two-Phase Commits
I’m just not a fan of the Microsoft Distributed Transaction Coordinator. I’ve tweeted and blogged a few times about it, but I’ve never really gone into why. I should preface my remarks by saying that MSDTC does work—it will facilitate and coordinate distributed transactions using a two-phase commit (2PC) protocol. I should also say also that much of what follows is a rant against 2PC and MSDTC is a casualty in the argument. But like most things, it’s about tradeoffs. Do you really know how much 2PC using MSDTC costs you? Do you know what you’re giving up when you rely upon it and allow it to become an integral part of your system?
Now the details. The biggest “beef” I have with MSDTC is lack of real support among the different products that exist. To be fair, this isn’t really the fault of MSDTC. For example, how many queuing solutions (beyond MSMQ) have you found that support not only transaction enlistment (when using TransactionScope), but promotion to a two-phase commit. There are literally less than a handful of solution. If you want to leverage cloud-based queuing systems—Microsoft Azure Queues, Amazon SQS, sorry, there isn’t any support. You’ve gotta build it yourself and the model is fundamentally and diametrically opposed to the one used by MSDTC.
But what about database vendors? Surely there are a number of good RDBMS solutions that support 2PC. Sure, there are a some that support 2PC—but even fewer that have good driver implementations of transaction enlistment and promotion. In the RDBMS world, you’ve got a few “real” choices for MSDTC support: Microsoft SQL Server, Oracle, IBM DB2, and perhaps a few others. MySQL? Nope. PostgreSQL? Nope. Microsoft SQL Azure??? Not a chance.
Another beef that I have is a weird “edge case” that shouldn’t really ever happen except that it can and does and you need to be prepared for it. There is a small window in a two-phase commit during which one of the “cohorts” may go offline causing the transaction to be “in doubt”. I don’t know about you, but I like my transactions to be atomic. It succeeded or failed. In doubt? Seriously? That’s like giving a lock to a thread on some data and then having it corrupt the data when the thread aborts halfway through.
Even among software that does support 2PC on MSDTC there are weird bugs and issues that arise. NHibernate would leak connections; MySQL would forget it was a cohort in a transaction during a server restart and rollback the transaction—even when other cohorts had already committed; there have even been a few small but critical bugs in RavenDB when participating in a distributed transaction. 2PC is hard. There are lots of weird conditions and edge cases. Do you really want to encounter an issue like this? Making a single resource transactional in its own right isn’t terribly difficult—but when it has to participate and collaborate with other resources things become exponentially more difficult.
So far, most of the issues that we’ve discussed are specific to distributed transactions in general and MSDTC is a helpless victim. So let’s talk about MSDTC quirks specifically. First and foremost, when working on a single machine MSDTC configuration and setup is a piece of cake. Just make sure the service is running and you’re good to go. But once you choose to *distribute* across multiple machines it’s another story altogether. There are a number of steps that you must follow and missing one results in cryptic error messages. Microsoft has even released a few tools to help diagnose issues related to MSDTC configuration. You also must be sure that each MSDTC instance can authenticate with the others—above and beyond the authentication required to connect to each durable resource (database, message queue, etc.). This means either running in a Windows domain environment or synchronizing user accounts across machines.
The next issue is that of interoperability. One of the fallacies of distributed computing is that the network is homogenous. Do all of your servers run Windows? Do you have database servers running on Linux or some other operating system? Technically speaking MSDTC does support the “XA transactions”, but I’m not convinced that all parties involved in a distributed transaction are on speaking terms with MSDTC. If you avoided 2PC altogether, this is a non-issue. You could execute transactions against whatever resource using whatever operating system you choose.
What about if your software was written in .NET and you wanted to run on Mono? Good luck. Distributed transactions are not supported--MSDTC is Windows only. Without rewriting portions of your software, you can’t move over to Mono. In other words, reliance upon MSDTC has restricted the ability to choose the best technologies that best meet our needs.
The last issue is about weighing the costs involved. How much more “expensive” is a distributed transaction as compared with a simple lightweight transaction? MSDTC transaction “escalation” can be very expensive in terms of the latency involved because latency *isn’t* zero and because of the number of round trips involved in synchronizing all of the cohorts. Furthermore, MSDTC incurs significant overhead by merely acting as the intermediary, middleman, or broker required to coordinate everything.
For me, the cost of using MSDTC is unacceptably high and I have found ways to achieve the same end result without paying for the overhead incurred by MSDTC. As a result, I have a significant amount of flexibility and freedom in all of my technology choices. This freedom gives me a “swapability” and portability not achievable by someone that is shackled with awful chains of MSDTC. By default most use MSDTC because it’s “easier” but be aware what you’re giving up because you’re giving up a lot when you use it.
In my next post I will go over the various techniques and methods I have used to avoid a reliance upon distributed transactions. Most of the techniques are very simple and can be implemented without a lot of overhead thus adding transparency to your system while avoiding the magic black box of MSDTC.