Aquarium Blog

Friday, November 09, 2007

Moving to MSDN

I haven't decided yet, but it is very likely that I will stop blogging here for some time.

For some background, I have moved to the sate of Washington and now I am working for Microsoft.

I have my new blog setup on MSDN. I was glad to find that no other Diego was blogging at the company :)

Monday, September 10, 2007

So much to learn!

I only followed halfway of the rabbit hole about the idea of "composable data (entity) services" and found that much has been written and debated about the topic. It seams, for instance, that part of the SOA intelligentsia has been discussing if the concept of entity services could be some kind of anti-pattern.

On the other side, I think Astoria's value proposition is very solid, at least for the scenarios currently targeted (which I think are more oriented to mashups that do all data aggregation on the client side). Also, Pablo Castro addressed some of the concerns that could apply to Astoria shortly after MIX07 on his own blog.

The Entity Framework is also such a distinct beast, that could shift the balance on what is a good practice (i.e., by making entity services very easy and inexpensive to own).

I would really like to get the whole picture, but this will have to wait for now...

Sunday, September 09, 2007

Entity Coupling Service another name for Composable Data Service

Mats Helander apparently explores the same subject of "composable data services" in his post, although he seems to arrive from a different venue as Alex.

For me it smells more and more like Astoria + "Aggregation" (not that Astoria doesn't plan to support composition or aggregation, this I am not sure of).

I noticed that in some comments to that post, Udi expresses his opposition. I am not sure he opposes the entity services Mats refer to, or the idea of aggregating multiple data sources. I wish he explained his stance in more detail. Mats explanation of Entity services makes me believe the refers to services that only handle a single kind of entity each, which to my intuition sounds like exaggeratedly granular.

Tuesday, September 04, 2007

Updated visited countries

Just needed a little distraction and found someone visited my old post:




create your own visited country map

Note to self: Always use Windows Live Writer

I admit that I do much editing of my posts post-publishing. This is specially true for a post like the last one, in which I try to explain a fairly complex idea with my rudimentary English.

But I usually need to look at the finished post on the actual blog layout to detect most errors and readability problems. So, the process usually goes like this: I open my blog in the browser and start reading. When I find something I want to change, I open the Blogger page in other window and start correcting it. Unfortunately, Blogger's editing pane won't grow enough to give me a good view of the text I am editing. Another annoyance is that the spell checking won't always work because of pop-up blockers.

I had tried Windows Live Writer before and I was somewhat impressed, but it was today that I really began to appreciate the difference.

What I like the most about it:

  1. WYSIWYG, full-screen editing.
  2. Managing multiple blogs (it supports most blogging platforms).
  3. Integrated spell checker.
  4. Paste special/Thinned HTML.

There is no reason I will use Blogger's user interface again to post a new entry.

Monday, September 03, 2007

Around the globe with composable data services

Ayende and Alex have been having an interesting conversation on the subject of data access layer componentization, in the light of some new features that are appearing in Microsoft's Entity Framework and some previous work by Alex on Base4.NET.

You can find some of the most relevant posts of the conversation here, here and here.

I read Ayende's answer last night and Alex answer this morning. I was about to write a comment, but it suddenly grew too much. Alex lives in New Zealand and Ayende, I think, lives in Israel. I am writing this at almost 3:00 PM (GMT-4, Caribbean time). I hope they are sleeping right now, so I will have time to do the usual editing after publishing!

I agree with Alex that most of Ayende's concerns could be addressed by the composable EFx data services Alex envisions (he actually prefers to use the term "dataservers", but I think it is opportune to borrow some jargon from Astoria).

Note: In this case we have used the terms "composable" and "componentization" in the sense that the service can aggregate information from multiple backends under a single conceptual model. Maybe we should find a more explicit term to avoid overlaps with the use of "composability" elsewhere.

Looking at Ayende's diagram I agree he did not get the complete picture Alex was painting. To his favor, one must admit that the composable data services Alex talks about are still not even “vaporware”.

I really just want to add two elements to the conversation:

First, if you added caching of read-only data as a feature of the data service, you would get a better substitute for the ETL process that Ayende mentions (Note To Alex: You can consider this a feature request!).

Second, while Ayende’s preferred solution may look very good and the simplest thing to do in some scenarios, IMHO its main weakness is that it does not scale. Let me try to explain it with an extreme example:

Suppose an enterprise has 5 mayor systems serving 5 departments, each of them with its own data silo. One day, each department contracts a consultant to help them do some data integration with the other systems (not that this should ever happen in real life!).

A few weeks later, each consultant comes up with a solution very much like the one Ayende explains: Each one contains its own schema for the data coming from the other 4 databases, each of these new schemas is fed by a separate ETL process, etc.

Now that the five consultants took their money, let’s analyze what the customer actually got:

ONE TIME COST: Contained in each of the original 5 systems there is a subset of the data that needs to be shared. But instead of sharing it, the consultants decided simultaneously that the easiest path for each of them was to duplicate this data. So, in the end, for this subset of data that needs to be shared, the increase of storage is up to 5 x 4 = 20 fold! This will not only cost hardware: The schemas for this subset have been reinvented up to 20 times too and 5 different ETL processes had to been designed, implemented and tested.

RELIABILITY: For simplicity sake, we will only consider uptime, which measures the ability of the system not to "go down", and not its ability to maintain data consistency. If you do the math, I think you will see that in theory the customer's infrastructure is now more tolerant to failure. However, in "reality" the infrastructure is now much more complex, and hence much of this advantage is "lost to entropy" (every time something goes wrong, fixing it is more complex). You could have instead invested the same money in redundancy for each of the 5 original systems. While two-fold redundancy buys less reliability than five-fold redundancy, most failover solutions won’t add so much complexity.

MAINTAINABILITY: I don’t understand quite well Ayende’s points regarding maintainability, because on the event of a single schema modification, he still needs to at least revise his ETL code. Although the system could surely be kept running for hours on outdated data (improving uptime, not maintainability), eventually he would need to adjust it. In my extreme example, any single schema change can potentially affect all 5 systems! In contrast, if you could create a single compound EFx data service, you would probably just compensate for the changes by adjusting the mapping, and only once. UPDATE: I see I was assuming here a "static" definition of maintainability, that is completely orthogonal with uptime. I may reconsider this argument, but it doesn't affect the main point.

SECURITY: I don’t clearly see Ayende’s point regarding security either. I think you need some means to perform authentication and flexible authorization, and to protect critical data, no matter if you are exposing it as a data service or if you make it available to an ETL process and then to users. Anyway, we still don't know exactly what shape will security take in EFx and Astoria.

PERFORMANCE: How the new system will actually perform is impossible to predict (due to too many factors that are not detailed in the example). However, we can easily observe lots of overhead in moving the same data among several servers. Once you have 5 copies of the data, you will probably see some performance improvement because of locality and parallelism. But the same effect could be achieved in a data service by using caching and conventional scale out measures. In such a case, schemas would not be unnecessarily complicated and consistency would be easier to maintain.

My point is that this data duplication approach, while simple at first, is a path that an organization should not take many times. Once you have, say, three of these processes in operation, it will probably be too much pain to add another one.

This is only how things happens in a fictitious example. And Oren only talked about one system doing this. However, my thesis is that this scenario is not too much detached from how things would go in real life.

I think the consultants would probably not talk much to each other, and they would probably never come up with an integrated solution. Why?

1. Business reasons: Simply put, each consultant is set to do what is best for his project and revenue in the short term, not what is good for their customer in the long term. They will optimize locally, not globally.

2. Most important of all, a technical reason: Unfortunately, there is currently no simple way of accomplishing the integration that the consultants could agree upon. This is precisely the need that composable EFx data services could address.

To satisfy the data integration needs of a company like the one in the example, a new kind of data access technology is needed: One that allows you to easily build data services that are composable, that can extract data from virtually any source, that expose a very high level (conceptual) data interface, which support flexible mapping, and that everyone can talk to using standard protocols.

I think that Alex and I agree that most pieces of this solution are already beginning to appear.

The last paragraphs sound a lot like marketing :D But seriously, if the Data Programmability Team were going to be built such a thing, it would be yet another reason for me to be excited.

Wednesday, June 27, 2007

Delphi Roadmap

I found today the Product Roadmap for Delphi, through a post in Julian Bucknall’s blog (Julian is the CTO at DevExpress) .

There seem to be some good news, but it still feels like everyone is avoiding the sad truth: The declining relevance of Delphi in the market.

I happen to be a .NET developer that holds some remote but very nice memories of coding on Object Oriented Turbo Pascal and Delphi.

I respect Delphi. I appreciate the importance of the existing codebase and the skill set of Delphi developers. I admire the people that worked on its design and the people that are working on it now. Like so, I believe in Delphi as a language and in Delphi as an “ecosystem”. I want those things to remain relevant. Actually, I think those are the core assets CodeGear still holds.

I don't really know if it is possible to build a sustainable business model solely on those essential values, but once you have this, next step would be to listen to what really make sense for developers.

My personal take: I believe in managed code and I could not care less about Win32. On one side, I see all sorts of cool things happening around the CLR. On the other side, the latest incarnation of the main Win32 vehicle, that is, Windows Vista, now comes with .NET 3.0 installed.

You may not completely love Vista, but no doubt that in a couple of years, most Windows computers out there will have at least .NET 3.x installed. Win32 is just a necessary evil, and perhaps it is not even so necessary!

The thing I love the most about .NET is the amount of existing code I can use and extend, regardless of the language it was originally written in. I also like the way it works with Unicode from the beginning. I love the way it helps me move to 64bits almost seamlessly. These things are mentioned in the Roadmap, and Julian mentions them as big issues (breaking changes for existing code).

I believe in the value of Visual Studio. I like using the designers: Windows Forms and WPF, the new Web Designer, WF, DSL, Team System, and the bunch of new things that will come in VS 2008, just like the new features in C# and VB.

Visual Studio may be not perfect but there are plenty of excellent third party extensions filling the holes. I would love to be able to use things like Refactor Pro! or ReSharper with Delphi.

Even further, it would be sweet to be able to use Delphi with NAnt, MbUnit, NDepends, Windsor, TDD.NET and the whole ALT.NET stack.

It would be awesome to run Delphi on Linux via Mono.

It would be fantastic to run Delphi code on a Mac via Silverlight.

It would be great to run ASP.NET AJAX applications on any browser, powered by Delphi code on the back end.

So, this is my wish list:

0. I said once I wanted Microsoft to buy Delphi from Borland. I think cannot count on this anymore, but what the hell...

1. I would like to see CodeGear to focus on Delphi the language, basically making it really fly on .NET 3.5 as soon as possible, complete with generics and LINQ.

2. I would like to see someone (CodeGear or else) write Visual Studio bindings for Delphi, and shipping Delphi on the Visual Studio 2008 Shell.

3. I would like to see someone (CodeGear or else) to take charge of a good compatibility/migration story for VCL.

4. I am cool with CodeGear wanting to continue the development of Win32 Delphi, to keep their C++ and IDE business, and to make a new Ruby on Rails IDE. I guess they, better than I, can assess if they are contributing anything really new and significant in those areas.

Wednesday, May 30, 2007

IsNullOrEmpty for IEnumerable

This is not extremely relevant, but String.IsNullOrEmpty() has become a very popular time saver, and I think the same concept should be applicable to arrays and collections.

I found on Microsoft Connect that someone already added a suggestion to add IsNullOrEmpty to arrays on 2005.

This is not a big discovery, but I have been playing with extension methods in Orcas and they are so nice!

What if I define this?

public static class IEnumerableExtensions
{
    public static bool IsNullOrEmpty(this   System.Collections.IEnumerable source)
    {
        if (source == null)
            return true;
        else
            return !source.GetEnumerator().MoveNext();
    }
}

Once you import the appropriate namespaces, all these things are possible:

string a = null;
Console.WriteLine(a.IsNullOrEmpty());

var b = new Dictionary();
Console.WriteLine(b.IsNullOrEmpty());

MemberInfo[] d = MethodInfo.GetCurrentMethod().DeclaringType.GetMembers();
Console.WriteLine(d.IsNullOrEmpty());

var g = from f in d
where f.MemberType == MemberTypes.NestedType
select f;
Console.WriteLine(g.IsNullOrEmpty());

I would like to see something like this included in System.Linq.Enumerable static class. Then it would be available to everyone, by default.

Update: I added a more complete entry as a suggestion on Microsoft Connect.

Update 2: At the Connect site, Mads teaches me why he thinks IsNullOrEmpty as an extension method is really a very bad idea. Basically, using the variable.Method invocation syntax on a method that is meant to work when the variable is null, it is very inconsistent with the instance method invocation semantics one usually gives to this syntax in languages like C# and VB.

I still think that the method, probably defined as a static method, would be nice to have on Enumerable (because it is already a well-known place to find methods that apply to the IEnumerable interface). Also, I think there is some void in the definition of extension methods. Its designers think that calling them on null instances should generally throw an exception, so why do I need to check for the parameter and throw the exception myself?

Friday, February 09, 2007

Unindexed Foreign Keys

A guy called Jordi Ramot, puts it in these words:

To decide if a foreign key needs to be indexed or not, I follow a simple rule:

I always/only create an index on a foreign key whether:

1 - A deletion on the parent table is allowed and it triggers a cascade delete on the child table

2 - There's need to perform JOIN queries from the parent to the child

In the first situation, an unindexed foreign key will force a full table scan for each parent record deleted. In the second situation, a lack of the foreign key index in the child table will slow down join queries.

I rarely find suitable to create indexes on foreign keys in other situations though.

I think the question is not only “to index or not to index” on foreign keys.

I have been debating this subject with my boss (a hardcore Informix believer) all day. We found that Informix creates indexes on foreign keys automatically, while Oracle, DB2, and SQL Server don't.

So, why did some engineers decide to go one way and others in the opposite? I think this is an interesting design issue.

Informix takes all responsibility in optimizing JOINs and CASCADING operations on the foreign key.

Instead, Oracle, DB2 and SQL Server will happily leave the burden of tuning indexes for JOINs and CASCADING operations on your shoulders.

So, even if an Informix DBA fails to tune the indexes, the database will probably show acceptable performance on JOIN operations.

Oracle, IBM and Microsoft/Sybase on the other side, apparently decided that tuning was an nonnegotiable duty of the DBA. However, there are many reasons to want the higher level of control those database engines provide:

First, each index you create comes with a cost. Not only it will use storage space, but once you created it, the database engine has to maintain it on every table update.

Also, 98% of all SELECT and UPDATE queries will probably include a WHERE clause or will involve more than one JOIN operation.

There is also an opportunity for index coverage, meaning that if the index contains certain columns, some SELECT could be resolved entirely by reading the indexes, and never touching the real table.

To get all those benefits at the same time, it is necessary a composite index that is headed by the foreign key but also includes other columns relevant to frequent queries.

So, the Informix approach is a winner for the most basic cases, but the higher level of control the other engines give you, could show better performance if tuned adequately (obviously, my boss won’t swallow that pill!).

If you want to distill a best practice from this, I think that creating indexes on your foreign keys is a a good first approach, but you should later tune your indexes globally, by using real profiling data.

Fortunately, for those of us using mostly SQL Server, Index Tuning Wizard exists.

UPDATE: You also have to consider how foreign keys are actually implemented. My boss found some articles that mention that in some RDBMs foreign keys are internally implemented as "pointer chains".

Saturday, February 03, 2007

Help Find Jim Gray

I know I don't manage any significant traffic here, but anyway: If you know of Jim Gray and how he has been missing in the sea from last Sunday, there is a way you can help in finding him.

The Coast Guard already called off their search effort and so friends and colleagues have taken the challenge.

Amazon set up a job in their site Mechanical Turk. So you can go there, login to you Amazon account and start visually scanning recent satellite images of the search area.

Update: You can go read on Werner Vogel's blog how they do it.

Please, join!