Selecting closest values in MySQL

Sometimes the need arises to select several values in the vicinity of a certain value, preferably ordered by proximity. The values might be dates, zip-codes or any other meaningfully ordered values that can be represented as numerical values. How can we pull this off in MySQL?

We can’t use a simple ORDER BY, since we want values both larger and smaller than our selected value. We can however order by an aggregate function that calculates the distance from our selected value.

Suppose we want to find the 6 closest numbers to the number 2500 (including) from a numbers table:

SELECT number, ABS( number - 2500 ) AS distance
FROM numbers
ORDER BY distance
LIMIT 6

This returns:

number distance
2502 2
2494 6
2508 8
2489 11
2513 13
2487 13

(6 total, Query took 2.2792 sec)

This works nicely, and is actually relatively performant if the number column is indexed. Despite having to run a full table scan in order to calculate the distance for every number in the table, running this on a ~2 million row table completes in just over 2 seconds.

This kind of performance would be quite enough for smaller tables, but often in real time applications we would like faster response times than 2 seconds for completing a query.

Since we know exactly what we need, we can help MySQL by limiting the range of numbers it has to calculate the distance for. Since we want the 6 closest numbers, we can be sure they’ll be at most in a range of 6 numbers lower and 6 numbers higher than our selected value.

If we can do that, then the calculation would run for only 13 numbers, hopefully leading to much improved performance. Selecting those numbers can be done using a union on a couple of SELECT statements:

(
   SELECT number
   FROM `numbers`
   WHERE number >= 2500
   ORDER BY number
   LIMIT 7
) UNION ALL (
   SELECT number
   FROM `numbers`
   WHERE number < 2500
   ORDER BY number DESC
   LIMIT 6
)

Notice the first one includes the actual value (in case it exists) and hence has one more value to select in its limit clause.

Combining this with our previous successful attempt produces the following ungainly query:

SELECT number, ABS( number - 2500 ) AS distance FROM (
	(
		SELECT number
		FROM `numbers`
		WHERE number >=2500
		ORDER BY number
		LIMIT 6
	) UNION ALL (
		SELECT number
		FROM `numbers`
		WHERE number < 2500
		ORDER BY number DESC
		LIMIT 6
	)
) AS n
ORDER BY distance
LIMIT 6

It does return the same results:

number distance
2502 2
2494 6
2508 8
2489 11
2513 13
2487 13

Is much more performant:

(6 total, Query took 0.0011 sec)

And there you have it.

A couple of notes:
- All queries were ran with SQL_NO_CACHE at least 5 times to ensure the timings were indicative of the performance.
- The queries were ran against a ~2 million table filled with randomly generated values.
- An index was created on the number column.

To know when the next article is published, please subscribe to new articles using your Email below or follow me on Twitter.

Subscribe to Blog via Email

Enter your email address to receive notification about new posts.

  • http://www.ossigenocms.com PiccoloPrincipe

    Nice.. Sql is a beautiful language and, as it’s said database are very good at deal with data. This leave us free of implementing complex logic with Php, for example.

  • http://www.ossigenocms.com PiccoloPrincipe

    Nice.. Sql is a beautiful language and, as it’s said database are very good at deal with data. This leave us free of implementing complex logic with Php, for example.

  • http://blog.donaldorgan.com Donald Organ

    @^^ Or you could check out the Andromeda Project http://www.andromeda-project.org and implement complex logic in the database as well and just worry about the front end and pulling the data with PHP… :)

  • http://blog.donaldorgan.com Donald Organ

    @^^ Or you could check out the Andromeda Project http://www.andromeda-project.org and implement complex logic in the database as well and just worry about the front end and pulling the data with PHP… :)

  • Pingback: Seleccionar cercanos mediante MySQL | dominios, diseƱo web, ecommerce - Mantis Technology Solutions Blog()

  • Kristofer La Cour

    This i perfect, just what i needed, thanks allot! :)

  • Ian Ringrose

    A spatial index is another option, store all values as points with a fixed value for y, then a spatial index will provide a fast result.

  • Pingback: Drupal 7, GPS lat, lng closes value, is it possible? | web technical support()

  • http://www.damianbendersky.com/ Damian Bendersky

    Just what I needed, thanks for posting!

  • Will Abbott

    I am trying to adapt this so I can search for the closest longitude and latitude – so it can handle 2 figures. Can you suggest how I can do this? Thanks

  • Miro Perec

    Today i was looking for a way to get one higher and one lower value than a specific one.
    What i came up with was this:

    SELECT * FROM WHERE id IN ((SELECT id FROM WHERE id > ORDER BY id ASC LIMIT 1),(SELECT id FROM WHERE id < ORDER BY id DESC LIMIT 1));
    Hopefully it helps someone.

    Greetings

  • http://toolongdidntread.com John Russell
  • http://www.binpress.com Eran Galperin

    The BETWEEN operator is just an alternative way to select between two values, that was not the point of this article at all.

  • Brian

    Is it possible to specify a range. I basically need to get the closest value to the one supplied by the user thats also within a certain tolerance either side. For instance + or – 50. I appreciate any insight

  • Abraham Tugalov

    This will use filesort … bad practice. Better to use upper edge technique with compound indexes.