Friday, February 09, 2007

Unindexed Foreign Keys

A guy called Jordi Ramot, puts it in these words:

To decide if a foreign key needs to be indexed or not, I follow a simple rule:

I always/only create an index on a foreign key whether:

1 - A deletion on the parent table is allowed and it triggers a cascade delete on the child table

2 - There's need to perform JOIN queries from the parent to the child

In the first situation, an unindexed foreign key will force a full table scan for each parent record deleted. In the second situation, a lack of the foreign key index in the child table will slow down join queries.

I rarely find suitable to create indexes on foreign keys in other situations though.

I think the question is not only “to index or not to index” on foreign keys.

I have been debating this subject with my boss (a hardcore Informix believer) all day. We found that Informix creates indexes on foreign keys automatically, while Oracle, DB2, and SQL Server don't.

So, why did some engineers decide to go one way and others in the opposite? I think this is an interesting design issue.

Informix takes all responsibility in optimizing JOINs and CASCADING operations on the foreign key.

Instead, Oracle, DB2 and SQL Server will happily leave the burden of tuning indexes for JOINs and CASCADING operations on your shoulders.

So, even if an Informix DBA fails to tune the indexes, the database will probably show acceptable performance on JOIN operations.

Oracle, IBM and Microsoft/Sybase on the other side, apparently decided that tuning was an nonnegotiable duty of the DBA. However, there are many reasons to want the higher level of control those database engines provide:

First, each index you create comes with a cost. Not only it will use storage space, but once you created it, the database engine has to maintain it on every table update.

Also, 98% of all SELECT and UPDATE queries will probably include a WHERE clause or will involve more than one JOIN operation.

There is also an opportunity for index coverage, meaning that if the index contains certain columns, some SELECT could be resolved entirely by reading the indexes, and never touching the real table.

To get all those benefits at the same time, it is necessary a composite index that is headed by the foreign key but also includes other columns relevant to frequent queries.

So, the Informix approach is a winner for the most basic cases, but the higher level of control the other engines give you, could show better performance if tuned adequately (obviously, my boss won’t swallow that pill!).

If you want to distill a best practice from this, I think that creating indexes on your foreign keys is a a good first approach, but you should later tune your indexes globally, by using real profiling data.

Fortunately, for those of us using mostly SQL Server, Index Tuning Wizard exists.

UPDATE: You also have to consider how foreign keys are actually implemented. My boss found some articles that mention that in some RDBMs foreign keys are internally implemented as "pointer chains".

Saturday, February 03, 2007

Help Find Jim Gray

I know I don't manage any significant traffic here, but anyway: If you know of Jim Gray and how he has been missing in the sea from last Sunday, there is a way you can help in finding him.

The Coast Guard already called off their search effort and so friends and colleagues have taken the challenge.

Amazon set up a job in their site Mechanical Turk. So you can go there, login to you Amazon account and start visually scanning recent satellite images of the search area.

Update: You can go read on Werner Vogel's blog how they do it.

Please, join!

Moving to MSDN

I haven't decided yet, but it is very likely that I will stop blogging here for some time. For some background, I have moved to the sate...