Sometimes the need arises to select several values in the vicinity of a certain value, preferably ordered by proximity. The values might be dates, zip-codes or any other meaningfully ordered values that can be represented as numerical values. How can we pull this off in MySQL?
We can't use a simple ORDER BY, since we want values both larger and smaller than our selected value. We can however order by an aggregate function that calculates the distance from our selected value.
Suppose we want to find the 6 closest numbers to the number 2500 (including) from a numbers table:
SELECT number, ABS( number - 2500 ) AS distance FROM numbers ORDER BY distance LIMIT 6
(6 total, Query took 2.2792 sec)
This works nicely, and is actually relatively performant if the number column is indexed. Despite having to run a full table scan in order to calculate the distance for every number in the table, running this on a ~2 million row table completes in just over 2 seconds.
This kind of performance would be quite enough for smaller tables, but often in real time applications we would like faster response times than 2 seconds for completing a query.
Since we know exactly what we need, we can help MySQL by limiting the range of numbers it has to calculate the distance for. Since we want the 6 closest numbers, we can be sure they'll be at most in a range of 6 numbers lower and 6 numbers higher than our selected value.
If we can do that, then the calculation would run for only 13 numbers, hopefully leading to much improved performance. Selecting those numbers can be done using a union on a couple of SELECT statements:
( SELECT number FROM `numbers` WHERE number >= 2500 ORDER BY number LIMIT 7 ) UNION ALL ( SELECT number FROM `numbers` WHERE number < 2500 ORDER BY number DESC LIMIT 6 )
Notice the first one includes the actual value (in case it exists) and hence has one more value to select in its limit clause.
Combining this with our previous successful attempt produces the following ungainly query:
SELECT number, ABS( number - 2500 ) AS distance FROM ( ( SELECT number FROM `numbers` WHERE number >=2500 ORDER BY number LIMIT 6 ) UNION ALL ( SELECT number FROM `numbers` WHERE number < 2500 ORDER BY number DESC LIMIT 6 ) ) AS n ORDER BY distance LIMIT 6
It does return the same results:
Is much more performant:
(6 total, Query took 0.0011 sec)
And there you have it.
A couple of notes:
- All queries were ran with SQL_NO_CACHE at least 5 times to ensure the timings were indicative of the performance.
- The queries were ran against a ~2 million table filled with randomly generated values.
- An index was created on the number column.