问题
I'm working on the following query but not sure how to proceed with further optimizations:
SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
FROM users u
WHERE u.id IN (
SELECT uu.id
FROM users uu
WHERE uu.latitude between lat1 and lat2 -- MySQL 5.7 supports Point data type, but it is not indexed in innoDB. We store latitude and longitude as DOUBLE for now
AND uu.longitude between lon1 and lon2
AND uu.dateOfBirth between maxAge and minAge -- dates are in millis, therefore maxAge will have a smaller value than minAge and so it needs to go first
)
AND IF(gender is null, TRUE, u.gender = gender)
AND IF(activityType is null, TRUE, u.preferredActivityType = activityType)
AND u.accountState = 'A'
AND u.id != userId
HAVING distanceInMiles < searchRadius ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;
CREATE INDEX `findMatches` ON `users` (`latitude` ASC, `longitude` ASC, `dateOfBirth` ASC) USING BTREE;
The idea here is to have an inner query that identifies qualifying rows based on user location and age, using the covering index specified above. In a table with a few million rows, that narrows them down to a few thousand without requiring a full table scan. Then the resulting rows are tested against more fine-grain conditions, such as gender, availability, etc. - this time a full scan of the reduced resultset is inevitable.
This runs "almost" as expected, EXPLAIN
reveals that the inner query does indeed make use of the full key length (3 columns) of the covering index, and the outer query then looks up the rows returned, by PK.

The problem:
The performance is satisfactory while the search range is within a few hundred miles, but when I get to a thousand miles, it starts degrading because the number of users within the specified boundary increases. The problem would also become apparent if the search range remained the same, but the number of users increased by a few orders. Here are the problems that I have identified so far:
- MySQL does not currently support
LIMIT
in inner queries, therefore the inner query will return ALL qualifyinguserIDs
(i.e., thousands), even if the outer query will then limit them to just a dozen or so. - Enabling
optimizer_trace
and looking at the work behind the scenes suggests that only columnlatitude
of my covering index is used as arange
. I'm not sure why that is, especially sinceEXPLAIN
suggests that the full index key length is used.
The question:
How do I solve (1) and (2) above? Before somebody suggests using spatial data types for lat and long, please note that the latest InnoDB engine (MySQL v5.7) does not support spatial indexes, just spatial data types.
回答1:
You can simplify your query to:
SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
FROM users u
WHERE u.latitude between lat1 and lat2
AND u.longitude between lon1 and lon2
AND u.dateOfBirth between maxAge and minAge
AND IF(gender is null, TRUE, u.gender = gender)
AND IF(activityType is null, TRUE, u.preferredActivityType = activityType)
AND u.accountState = 'A'
AND u.id != userId
HAVING distanceInMiles < searchRadius
ORDER BY distanceInMiles
LIMIT pagingStart, pagingLength;
and then create index for all columns in where
clause, you can play with order of columns in index, start with columns which has less distinct values (like gender, state)
回答2:
I think this has already been covered in other answers. There is a difference between using the data in the index for a query, and using the data in the index to seek to the right solution. The latter is the most efficient use of an index. The former is helpful, but the efficiency is simply in not reading in the data pages.
I think you can improve your query by using exists
instead of in
. This should allow filtering at the outer level to improve performance of the query:
SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
FROM users u
WHERE EXISTS (SELECT 1
FROM users uu
WHERE uu.latitude between lat1 and lat2 AND
uu.longitude between lon1 and lon2 AND
uu.dateOfBirth between maxAge and minAge AND
uu.id = u.id
) AND
IF(gender is null, TRUE, u.gender = gender) AND
IF(activityType is null, TRUE, u.preferredActivityType = activityType) AND
u.accountState = 'A' AND
u.id <> userId
HAVING distanceInMiles < searchRadius
ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;
As a note, the expression IF(gender is null, TRUE, u.gender = gender)
is rather non-sensical because it always evaluates to true. If you have a variable called gender
, it will not be used in this expression. The gender
will be interpreted according to MySQL scoping rules and be a column in a table. You should always use a prefix like var_
or p_
or something to distinguish parameters from columns in the table.
EDIT:
I should have mentioned that the index needs to include id
as the first column to use with exists
.
来源:https://stackoverflow.com/questions/24214893/trouble-optimizing-mysql-query