Selecting closest values in MySQL

Did you know that DaisySlot and most online casinos have a denominator when is the best time to play? That means some kind of strategy is needed to win the game constantly. Sometimes the need arises to select several values in the vicinity of a certain value, preferably ordered by proximity. The values might be dates, zip-codes or any other meaningfully ordered values that can be represented as numerical values. How can we pull this off in MySQL?

We can’t use a simple ORDER BY, since we want values both larger and smaller than our selected value. We can however order by an aggregate function that calculates the distance from our selected value.

Suppose we want to find the 6 closest numbers to the number 2500 (including) from a numbers table:

SELECT number, ABS( number - 2500 ) AS distance
FROM numbers
ORDER BY distance
LIMIT 6

This returns:

number distance
2502 2
2494 6
2508 8
2489 11
2513 13
2487 13

(6 total, Query took 2.2792 sec)

This works nicely, and is actually relatively performant if the number column is indexed. Despite having to run a full table scan in order to calculate the distance for every number in the table, running this on a ~2 million row table completes in just over 2 seconds.

This kind of performance would be quite enough for smaller tables, but often in real time applications we would like faster response times than 2 seconds for completing a query.

Since we know exactly what we need, we can help MySQL by limiting the range of numbers it has to calculate the distance for. Since we want the 6 closest numbers, we can be sure they’ll be at most in a range of 6 numbers lower and 6 numbers higher than our selected value.

If we can do that, then the calculation would run for only 13 numbers, hopefully leading to much improved performance. Selecting those numbers can be done using a union on a couple of SELECT statements:

(
   SELECT number
   FROM `numbers`
   WHERE number >= 2500
   ORDER BY number
   LIMIT 7
) UNION ALL (
   SELECT number
   FROM `numbers`
   WHERE number < 2500
   ORDER BY number DESC
   LIMIT 6
)

Notice the first one includes the actual value (in case it exists) and hence has one more value to select in its limit clause.

Combining this with our previous successful attempt produces the following ungainly query:

SELECT number, ABS( number - 2500 ) AS distance FROM (
	(
		SELECT number
		FROM `numbers`
		WHERE number >=2500
		ORDER BY number
		LIMIT 6
	) UNION ALL (
		SELECT number
		FROM `numbers`
		WHERE number < 2500
		ORDER BY number DESC
		LIMIT 6
	)
) AS n
ORDER BY distance
LIMIT 6

It does return the same results:

number distance
2502 2
2494 6
2508 8
2489 11
2513 13
2487 13

Is much more performant:

(6 total, Query took 0.0011 sec)

And there you have it.

A couple of notes:
– All queries were ran with SQL_NO_CACHE at least 5 times to ensure the timings were indicative of the performance.
– The queries were ran against a ~2 million table filled with randomly generated values.
– An index was created on the number column.

To know when the next article is published, please subscribe to new articles using your Email below or follow me on Twitter.

Subscribe to Blog via Email

Enter your email address to receive notification about new posts.