MySQL: How to make a faster IP range query? GeoIP

问题

I have a PHP/MySQL geo-ip script which takes a user's IP address, converts it to long integer and searches through an IP range table for a single geographical location id of where the user's IP is located in:

$iplong = ip2long($_SERVER['REMOTE_ADDR']);

SELECT id FROM geoip WHERE ".$iplong." BETWEEN range_begin AND range_end ORDER BY range_begin DESC LIMIT 1

The "geoip" table contains 2.5M rows. Both the "range_begin" and "range_end" column are Unique Indexes. IP ranges don't seem to overlap. Sometimes this query takes about 1 second to complete, but I was hoping there was a way to improve it as it is the slowest query on my site.

Thanks

EDIT: I changed my query to:

SELECT * FROM geoip WHERE range_begin <= ".$iplong." AND range_end >= ".$iplong." ORDER BY range_begin DESC LIMIT 1

I now have a UNIQUE Composite Index (range_begin, range_end). I used the "EXPLAIN" function and it looks like it still searches through 1.2M rows:

id: 1 select_type: Simple table: geoip type: range possible_keys: range_begin key: range_begin key_len: 8 ref: NULL rows: 1282026 Extra: Using Index Condition

回答1:

It's a very useful exercise to spend some time thinking about why a conventional index is useless in a scenario like this. Indeed if you can get the query to use the index you will find it will probably be slower than running a full table scan.

Explaining why would take more space than available here. There is a solution - which is to treat the ipaddress database as a one dimensional space and use spatial indexing. But mysql spatial indexes only work in 2 dimensions - so you need to map the coordinate into a 2 dimensional space as described here

Note that the greater than / limit method, although faster than the spatial index becomes messy when you start dealing with nested sub-nets.

回答2:

I was dealing with a similar issue, where I had to search a database with about 4 Million IP ranges and found a nice solution that brought the number of scanned rows down from 4 Millions to about ~5 (depending on the IP):

This SQL Statement:

SELECT id FROM geoip WHERE $iplong BETWEEN range_begin AND range_end

is transformed to:

SELECT id FROM geoip WHERE range_begin <= $iplong AND range_end >= $iplong

The issue is that MySQL retrieves all rows with 'range_begin <= $iplong' and then needs to scan if 'range_end >= $iplong'. This first AND condition (range_begin <= $iplong) retrieved about 2 Million rows, and all need to be checked if range_end matches.

This however can be simplified dramatically by adding one AND condition:

SELECT id FROM geoip WHERE range_begin <= $iplong AND range_begin >= $iplong-65535 AND range_end >= $iplong

The statement

range_begin <= $iplong AND range_begin >= $iplong-65535

retrieves only entries where the range_begin is between $iplong-65535 and $iplong. In my case, this reduced the number of retrieved rows from 4 Mio. to about 5 and the script runtime went down from multiple minutes to a few seconds.

Note on 65535: This is for my table the maximal distance between range_begin and range_end, i.e., (range_end-range_begin) <= 65535 for all my rows. If you have larger IP-ranges, you must increase the 65535, if you have smaller IP-ranges, you can decrease this constant. If this constant is too large (for example 4 Billion), you will not save any query time.

For this query, you only need an index on range_begin.

来源：https://stackoverflow.com/questions/41079741/mysql-how-to-make-a-faster-ip-range-query-geoip

标签

mysql