Sunday, August 23, 2009

Examples of bad queries

In my years of teaching MySQL topics, I've noticed that many people write bad queries. It's not necessarily their fault. It could be that they don't understand some aspect of SQL, or that they solved the problem with the first technique that came to mind, or that their query is fine except that MySQL optimizes it poorly. I'll give a few examples.

Correlated vs. Non-Correlated subquery
For those that don't know what a correlated subquery is: it means that the subquery relies on values from the outer query. If the subquery could be executed by itself, then it would be non-correlated. Here's an example of a non-correlated query. Using the `world` database, I want to return all cities that have a population larger than New York's population:

SELECT name FROM City WHERE population > (SELECT population FROM City WHERE name = 'new york');

Since the subquery can be run by itself, it's non-correlated. These queries are generally okay, though if they are slow you should check to see if indexes are being used or consider re-writing as a join.

Now here's an example of a correlated subquery. The query returns the countries that have a city bigger than 9 million people:

SELECT name FROM Country AS co WHERE EXISTS (SELECT * FROM City WHERE population>9000000 AND co.code = countrycode);

Notice that the subquery references the outer query (co.code). This causes the subquery to execute for each row of the outer query. That's a lot of work! This could have been written as a non-correlated subquery:

SELECT name FROM Country WHERE code IN(SELECT countrycode FROM City WHERE population>9000000);

The problem with the above query (as you can verify with EXPLAIN) is that MySQL 5.1 executes this as a correlated subquery even though it shouldn't be. This is fixed in 6.0! In the meantime... run EXPLAIN on your slow subqueries and look for "DEPENDENT SUBQUERY" - that means it's correlated. Try to re-write it. Here's the above query as a join:

SELECT DISTINCT Country.name FROM Country JOIN City ON code = countrycode WHERE City.population > 9000000;

The MySQL optimizer in 5.x has a lot of options when optimizing joins, such as choosing the best table order and considering all indexes.

The point of this post is: the easiest query is not always the best one. If it's slow, use EXPLAIN and think about another strategy. There's more than one way to skin the cat. And remember to upgrade to the latest version of MySQL - the optimizer team is hard at work.

4 comments: