问题
I have a PHP/MySQL geo-ip script which takes a user's IP address, converts it to long integer and searches through an IP range table for a single geographical location id of where the user's IP is located in:
$iplong = ip2long($_SERVER['REMOTE_ADDR']);
SELECT id FROM geoip WHERE ".$iplong." BETWEEN range_begin AND range_end ORDER BY range_begin DESC LIMIT 1
The "geoip" table contains 2.5M rows. Both the "range_begin" and "range_end" column are Unique Indexes. IP ranges don't seem to overlap. Sometimes this query takes about 1 second to complete, but I was hoping there was a way to improve it as it is the slowest query on my site.
Thanks
EDIT: I changed my query to:
SELECT * FROM geoip WHERE range_begin <= ".$iplong." AND range_end >= ".$iplong." ORDER BY range_begin DESC LIMIT 1
I now have a UNIQUE Composite Index (range_begin, range_end). I used the "EXPLAIN" function and it looks like it still searches through 1.2M rows:
id: 1 select_type: Simple table: geoip type: range possible_keys: range_begin key: range_begin key_len: 8 ref: NULL rows: 1282026 Extra: Using Index Condition
回答1:
It's a very useful exercise to spend some time thinking about why a conventional index is useless in a scenario like this. Indeed if you can get the query to use the index you will find it will probably be slower than running a full table scan.
Explaining why would take more space than available here. There is a solution - which is to treat the ipaddress database as a one dimensional space and use spatial indexing. But mysql spatial indexes only work in 2 dimensions - so you need to map the coordinate into a 2 dimensional space as described here
Note that the greater than / limit method, although faster than the spatial index becomes messy when you start dealing with nested sub-nets.
回答2:
I was dealing with a similar issue, where I had to search a database with about 4 Million IP ranges and found a nice solution that brought the number of scanned rows down from 4 Millions to about ~5 (depending on the IP):
This SQL Statement:
SELECT id FROM geoip WHERE $iplong BETWEEN range_begin AND range_end
is transformed to:
SELECT id FROM geoip WHERE range_begin <= $iplong AND range_end >= $iplong
The issue is that MySQL retrieves all rows with 'range_begin <= $iplong' and then needs to scan if 'range_end >= $iplong'. This first AND condition (range_begin <= $iplong) retrieved about 2 Million rows, and all need to be checked if range_end matches.
This however can be simplified dramatically by adding one AND condition:
SELECT id FROM geoip WHERE range_begin <= $iplong AND range_begin >= $iplong-65535 AND range_end >= $iplong
The statement
range_begin <= $iplong AND range_begin >= $iplong-65535
retrieves only entries where the range_begin is between $iplong-65535 and $iplong. In my case, this reduced the number of retrieved rows from 4 Mio. to about 5 and the script runtime went down from multiple minutes to a few seconds.
Note on 65535: This is for my table the maximal distance between range_begin and range_end, i.e., (range_end-range_begin) <= 65535 for all my rows. If you have larger IP-ranges, you must increase the 65535, if you have smaller IP-ranges, you can decrease this constant. If this constant is too large (for example 4 Billion), you will not save any query time.
For this query, you only need an index on range_begin.
来源:https://stackoverflow.com/questions/41079741/mysql-how-to-make-a-faster-ip-range-query-geoip