How to improve order by performance with joins in mysql

核能气质少年 提交于 2019-12-06 00:33:23

问题


I am working on a social network tracking application. Even joins works fine with proper indexing. But when I add the order by clause the total query takes 100 times longer time to execute. The following query I used to get the twitter_users without order by clause.

SELECT DISTINCT  `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id` 
AND  `tracker_twitter_content`.`tracker_id` =  '88'
LIMIT 20

Showing rows 0 - 19 (20 total, Query took 0.0714 sec)

But when I add order by clause ( on indexed column )

SELECT DISTINCT  `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` =  `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` =  `tracker_twitter_content`.`twitter_content_id` 
AND  `tracker_twitter_content`.`tracker_id` =  '88'
ORDER BY tracked_twitter.followers_count DESC 
LIMIT 20

Showing rows 0 - 19 (20 total, Query took 13.4636 sec)

EXPLAIN

When I implement the order by clause in its table alone it doesn't take much time

SELECT * FROM `tracked_twitter` WHERE 1 order by `followers_count` desc limit 20

Showing rows 0 - 19 (20 total, Query took 0.0711 sec) [followers_count: 68236387 - 10525612]

The table creation query as follows

CREATE TABLE IF NOT EXISTS `tracked_twitter` (
    `id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `handle` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `location` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
    `description` text COLLATE utf8_unicode_ci,
    `profile_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `followers_count` int(11) NOT NULL,
    `is_influencer` tinyint(1) NOT NULL DEFAULT '0',
    `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    `gender` enum('Male','Female','Other') COLLATE utf8_unicode_ci 
     DEFAULT NULL,
     PRIMARY KEY (`id`),
     KEY `followers_count` (`followers_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

So join didn't slow the query and order by working well when I execute it on its table. So how can I improve performance?

UPDATE 1

@GordonLinoff method solves if i only need the result set from parent table. What f I want to know the number tweets per person (count of twitter_content which match the tracked_twitter table). How can I modify it? And if I want to have math functions on tweet content how do I do it ??

SELECT  `tracked_twitter` . * , COUNT( * ) AS twitterContentCount, retweet_count + favourite_count + reply_count AS engagement
FROM  `tracked_twitter` 
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` =  `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` =  `tracker_twitter_content`.`twitter_content_id` 
WHERE  `is_influencer` !=  '1'
AND  `tracker_twitter_content`.`tracker_id` =  '88'
AND  `tracked_twitter_id` !=  '0'
GROUP BY  `tracked_twitter`.`id` 
ORDER BY twitterContentCount DESC 
LIMIT 20 
OFFSET 0

回答1:


Try getting rid of the distinct. That is a performance killer. I'm not sure why your first query works quickly; perhaps MySQL is smart enough to optimize it away.

I would try:

SELECT tt.id
FROM tracked_twitter tt
WHERE EXISTS (SELECT 1
              FROM twitter_content tc INNER JOIN  
                   tracker_twitter_content ttc
                   ON  tc.id =  ttc.twitter_content_id
              WHERE  ttc.tracker_id =  88 AND
                     tt.id =  tc.tracked_twitter_id
             )
ORDER BY tt.followers_count DESC ;

For this version, you want indexes on: tracked_twitter(followers_count, id), twitter_content(tracked_twitter_id, id), and tracker_twitter_content(twitter_content_id, tracker_id).




回答2:


Parent table keep on bracket with limit

SELECT DISTINCT  `tracked_twitter`.id FROM
(SELECT id,followers_count  FROM tracked_twitter ORDER BY followers_count DESC 
LIMIT 20) AS tracked_twitter
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` =  `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` =  `tracker_twitter_content`.`twitter_content_id` 
AND  `tracker_twitter_content`.`tracker_id` =  '88'
ORDER BY tracked_twitter.followers_count DESC 



回答3:


The main problem is that even that you have relatively few rows, you use varchar(255) COLLATE utf8_unicode_ci as a primary key (instead of integers) and hence as the foreign key in other tables. The same problem, I suspect, is with twitter_content.id. This causes a lot of long string comparisons and reserving a lot of extra memory for the temporary tables.

Concerning the query itself, yes, it should be a query that walks along the followers_count index and checks the condition for the related tables. This could be done as Gordon Linoff suggested, or by using index hints.



来源:https://stackoverflow.com/questions/46053284/how-to-improve-order-by-performance-with-joins-in-mysql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!