Wednesday, December 06, 2006

Transactional File System in Windows Vista

In November, 1998, Microsoft Transaction Server was about a year old and SQL Server 7 was just arriving. I had at hand the task of coding a small CRM-like application in Visual Basic 5. Among other features, it had to upload unstructured documents and keep them linked to rows in a database.

I had one important decision to make: Should those documents be stored in the database itself or in the server file system?

SQL Server 6.5 had a lot of limitations with its lack of row locking and some performance issues with BLOB columns.

On the other side, the file system lacked transactional capabilities, and I lacked the ability to create a Compensating Resource Manager.

A transactional file system would have been super useful.

In November 2006, eight years later, Windows Vista is available. Transactions were introduced as a new feature of NTFS, named TxF. The Windows Registry is also getting support for transactions in Vista, under the name of TxR.

Before TxF, for instance, if you wanted to get ACID-like behavior from multiple file system operations, you could, but you had to fiddle a lot with temporary files, renaming, etc. With TxF you just issue something like a "begin transaction", then do your stuff in NTFS, and last, you commit or roll back the whole thing.

This way, TxF pushes best practices under the rug, and pushes the developer one level of abstraction up regarding files.

Surendra Verma, Developer Manager in the CFS group, explained how TxF/TxR works in Channel 9 some months ago. But it is interesting to note that after the video was recorded, there were major design changes to TxF/TxR.

As Jim Johnson explains in this first, second and third posts, from Beta 2 to RC1, TxF API changed from "implicit transaction enlistment" model that worked with the existing Win32 file APIs to a more explicit model for which new "Transacted" versions of some APIs were added.

In the first version, you just did something like:

EnterTransactionScope();
// do whatever file work with your favorite file APIs
ExitTransactionScope();

Everything you did in the middle got automatically enlisted in a thread specific ambient transaction.

In the new model, you have to do something like this (some function names were invented):

hTransaction = GetTransactionHandle();
hFile = CreateFileTransacted(... hTransaction ...);
// do whatever, but now using new *Transacted APIs
CloseHandle(hTransaction );

The complete listing of APIs that were affected by TxF is here.:

If you take a look at it, all the APIs for which a new Transacted version were created are file name-based. Besides, some existing APIs were updated and are now transaction aware, meaning that they acquire transactional behavior in the presence of a file handle that is associated to a transaction (for file handle-based APIs) or in the presence of a thread level ambient transaction (for yet another group of file name-based APIs).

The reason Microsoft change models, as explained by Surendra in the discussion of the video in Channel 9, is that the more simple original version, had a major drawback:

Between any pair of EnterTransactionScope()/ExitTransactionScope(), every single file or registry operation made by any code, even code lost in the middle of the programming stack was automatically and forcefully enlisted in the ambient transaction, acquiring a behavior that was often not intended at the time such code was created.

You could not opt-out.

So, if implicit transactions means that current code will break or misbehave, it is good that they abandoned this path.

On the other side, the main tradeoff of the new version, in my opinion, is that it is "too explicit":

Only those new APIs and those that have been changed will get transactional behavior. So, the hundreds, if not thousands, of higher level APIs that somehow affect the file system, won't get the possibility of having transactional behavior until the whole stack gets updated.

Now, you cannot opt-in.

For a .NET developer like me, this means that I have to use a lot of interop, or wait until new versions of System.IO.FileStream, methods like System.File.Delete, and even that the brand new APIs in System.IO.Packaging get revised to include the option of using transactions.

I have been thinking of a deceptively simple change they could do to the existing file name-based APIs, that could solve this issue. I must be missing something, or they would have implemented it on Vista.

I tried to discuss my idea con Surendra, but he is probably having vacation after shipping Vista :)

I will try to explain the idea in my next post...

No comments:

Moving to MSDN

I haven't decided yet, but it is very likely that I will stop blogging here for some time. For some background, I have moved to the sate...