Selecting closest values in MySQL

MySQL Web development February 2nd, 2009

Sometimes the need arises to select several values in the vicinity of a certain value, preferably ordered by proximity. The values might be dates, zip-codes or any other meaningfully ordered values that can be represented as numerical values. How can we pull this off in MySQL?

We can't use a simple ORDER BY, since we want values both larger and smaller than our selected value. We can however order by an aggregate function that calculates the distance from our selected value.

Suppose we want to find the 6 closest numbers to the number 2500 (including) from a numbers table:

SELECT number, ABS( number - 2500 ) AS distance
FROM numbers
ORDER BY distance
LIMIT 6

This returns:

number distance
2502 2
2494 6
2508 8
2489 11
2513 13
2487 13

(6 total, Query took 2.2792 sec)

This works nicely, and is actually relatively performant if the number column is indexed. Despite having to run a full table scan in order to calculate the distance for every number in the table, running this on a ~2 million row table completes in just over 2 seconds.

This kind of performance would be quite enough for smaller tables, but often in real time applications we would like faster response times than 2 seconds for completing a query.

Since we know exactly what we need, we can help MySQL by limiting the range of numbers it has to calculate the distance for. Since we want the 6 closest numbers, we can be sure they'll be at most in a range of 6 numbers lower and 6 numbers higher than our selected value.

If we can do that, then the calculation would run for only 13 numbers, hopefully leading to much improved performance. Selecting those numbers can be done using a union on a couple of SELECT statements:

(
   SELECT number
   FROM `numbers`
   WHERE number >= 2500
   ORDER BY number
   LIMIT 7
) UNION ALL (
   SELECT number
   FROM `numbers`
   WHERE number < 2500
   ORDER BY number DESC
   LIMIT 6
)

Notice the first one includes the actual value (in case it exists) and hence has one more value to select in its limit clause.

Combining this with our previous successful attempt produces the following ungainly query:

SELECT number, ABS( number - 2500 ) AS distance FROM (
	(
		SELECT number
		FROM `numbers`
		WHERE number >=2500
		ORDER BY number
		LIMIT 6
	) UNION ALL (
		SELECT number
		FROM `numbers`
		WHERE number < 2500
		ORDER BY number DESC
		LIMIT 6
	)
) AS n
ORDER BY distance
LIMIT 6

It does return the same results:

number distance
2502 2
2494 6
2508 8
2489 11
2513 13
2487 13

Is much more performant:

(6 total, Query took 0.0011 sec)

And there you have it.

A couple of notes:
- All queries were ran with SQL_NO_CACHE at least 5 times to ensure the timings were indicative of the performance.
- The queries were ran against a ~2 million table filled with randomly generated values.
- An index was created on the number column.

Share this article:
  • Digg
  • DZone
  • del.icio.us
  • Technorati

Tags: ,

3 Responses to “Selecting closest values in MySQL”

  1. #1

    Nice.. Sql is a beautiful language and, as it’s said database are very good at deal with data. This leave us free of implementing complex logic with Php, for example.

  2. #2
    Donald Organ says:

    @^^ Or you could check out the Andromeda Project http://www.andromeda-project.org and implement complex logic in the database as well and just worry about the front end and pulling the data with PHP… :)

  3. #3

Leave a Comment