Need help optimizing a lat/Lon geo search for mysql

為{幸葍}努か 提交于 2019-11-30 22:46:23

You are probably using a 'covering index' in your lat/lon only query. A covering index occurs when the index used by the query contains the data that you are selecting for. MySQL only needs to visit the index and never the data rows. See this for more info. That would explain why the lat/lon query is so fast.

I suspect that the calculations and the sheer number of rows returned, slows down the longer query. (plus any temp table that has to be created for the having clause).

I think you really should consider the use of PostgreSQL (combined with Postgis).

I have given up on MySQL for geospatial data (for now) because of the following reasons:

  • MySQL only supports spatial datatypes / spatial indexes on MyISAM tables with the inherent disadvantages of MyISAM (concerning transactions, referential integrity...)
  • MySQL implements some of the OpenGIS specifications only on a MBR-basis (minimum bounding rectangle) which is pretty useless for most serious geospatial querying-processing (see this link in the MySQL manual). Chances are you will need some of this functionality sooner of later.

PostgreSQL/Postgis with proper (GIST) spatial indexes and proper queries can be extremely fast.

Example: determining overlapping polygons between a 'small' selection of polygons and a table with over 5 million (!) very complex polygons, calculate the amount of overlap between these results + sort. Average runtime: between 30 and 100 milliseconds (This particular machine has a lot of RAM off course. Don't forget to tune your PostgreSQL install... (read the docs)).

Depending on the number of your listings could you create a view that contains

Listing1Id, Listing2ID, Distance

Basically just have all of the distances "pre-calculated"

Then you could do something like:

Select listing2ID from v_Distance d where distance < 5 and listing1ID = XXX

You really should avoid doing that much math in your select statement. That's probably the source of a lot of your slowdowns. Remember, SQL is a query language; it's really not optimized for trigonometric functions.

SQL will be faster and your overall results will be faster if you do a very naive distance search (which will return more results) and then winnow your results.

If you want to be using distance in your query, at the very least, use a squared distance calculation; sqrt calculations are notoriously slow. Squared distance is much easier to use. A squared distance calculation is simply using the square of the distance instead of the distance; it is much simpler. For cartesian coordinate systems, since the sum of the squares of the short sides of a right triangle equals the square of the hypotenuse, it's easier to calculate the square distance (just sum the two squares) than it is to calculate the distance; all you have to do is make sure that you're squaring the distance you want to compare to (so instead of finding the precise distance and comparing that to your desired distance (let's say 5), you find the square distance, and compare that to the square of the desired distance (25, if your desired distance was 5).

When I implemented geo radius search I just loaded all of the us Zipcodes into memory with their lat long and then used my starting point with radius to get a list of zipcodes in the radius and then used that for my db query. Of course I was using solr to do my searching because the search space was in the 20 million row range but the same principles should apply. Apologies for the shallowness of this response as I'm on my phone.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!