Optimising MySQL queries across hierarchical data

随声附和 提交于 2019-12-03 05:46:20

Without spending time testing it, you provided an incomplete example? you should definitely try reordering of joined tables. Explain output provides some info, let's say ordering by key_len should be heuristically fastest. First table to be filtered on should be listed as last in case the optimizer is not able to figure that out, I believe.

So, let's say 'c, v, k, u' order is the best.

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

'rows' would suggest 'c/u, k, v' order, but that depends on data:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

Hope this helps.

UPDATE (avoiding the varchar join):

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

Others might disagree, but I've had and regularly offer STRAIGHT_JOIN for queries... Once you KNOW the data and the relationships. Being that your WHERE clause is against the "V" table alias and it's "x" value, you are good with the index. Move THAT to the front position, then join from that.

SELECT STRAIGHT_JOIN DISTINCT
      v.`key`,
      u.`val`
   FROM
      VertexDictionary AS v 

         JOIN ConnectedVertices AS c
            ON v.x = c.head_x
            AND v.y = c.head_y

            JOIN VertexDictionary AS u 
               ON c.tail_x = u.x 
               AND c.tail_y = u.y

               JOIN SpecialKeys AS k
                  ON u.x = k.x
                  AND u.key = k.key
   WHERE
      v.x = {some value}      

Curious to know how this realignment works for you

Try rebuilding the query in stages; or at least give us some more points to identify where the bottlenecks are. Some combinations of the following queries should give you reasonable performance, if it is possible with out modifying the schema or data set.

What is the number of rows and exec times for the following queries for getting the list of suitable tail verticies (ie, that have a SpecialKey)

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
WHERE
    EXISTS (
        SELECT
            1
        FROM
            SpecialKeys sk
        WHERE
            vd.x = sk.x
        AND
            vd.key = sk.key
    )

or

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
JOIN
    SpecialKeys sk
ON
    vd.x = sk.x
AND
    vd.key = sk.key

or

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
WHERE
(vd.x, vd.key) IN (SELECT x, key FROM SpecialKeys)
-- also could try vd.key IN (SELECT sk.key FROM SpecialKeys sk WHERE sk.x = vd.x)

I'm hoping that one of these returns either small result set, or are at least quick to produce results. if low cardinality & large results apply distinct.

pick the best one from the previous two queries, and add to the next step: joining these suitable 'tails' to 'suitable heads'

SELECT -- DISTINCT
    cv.head_y as y,
    tv.val
FROM
(
    -- ADD SUB QUERY HERE also try nesting the subquery like: (select tail_x, tail_y, val from ([SUBQUERY]) as sq)

) as tv -- tail verticies
JOIN
    ConnectedVerticies cv
ON
    cv.tail_x = tv.tail_x
AND
    cv.tail_y = tv.tail_y
WHERE
    cv.head_x = X -- lets reduce the result set here.

Again, I'm hoping that one of these returns either small result set, or are at least quick to produce results. if low cardinality & large results apply distinct.

If it's falling over at this point, well there's not much hope of it getting faster to apply the last phase, and best to try a different approach.

As head x is known from the earlier query, we now just need to join on head_y and X to get v.key

SELECT DISTINCT
    inner_query.val,
    head.key
FROM
(
 -- previous nested subquery behemoth here, again, try a few things that might work.

) as inner_query
JOIN
    VertexDictionary as head
ON
    head.x = X
AND
    head.y = inner_query.y

Another approach, is to get a list of head.key, tail_x, and tail_y from

SELECT -- DISTINCT
    cv.tail_x as x,
    cv.tail_y as y,
    vd.key
FROM
    VertexDictionary vd
JOIN
    ConnectedVerticies cv
ON
    cv.head_x = vd.x
AND
    cv.head_y = vd.y
WHERE
    vd.head_x = X

How long does this take to execute, with & without distinct? how many results (w & w/o distinct)?

If it's fast and/or small, try using it as a subquery and joining to either another subquery potentiall of SpecialKeys & VertexDictionary if that's small (ie one of the first three queries if they worked well).

I suspect your problem is everything with the syntax

(k.x, k.key) = (u.x, u.key)

Can you rewrite as?

k.x = y.x and k.key = u.key

When you have a calculation on the left hand side of a clause, the dbms cannot optimize. By setting the comparison as a straight comparison, you may improve your performance.

e.g.

year(my_date) = '2012'

is slower than

'2012' = year(my_date)

I'm not sure if mysql treats the comparison as a column comparison or as a calculation.

Please try modifying your query to do column value comparisons.


Second optimization

Also - you are cross joining 4 tables. Multiplication is not additive - is it exponential. Are you sure this is what you intend? You may be better served getting starting with the smallest result set and then join only that result set to the next set.

select a.c1
from (
select t1.c1
from t1
join t2 on t1.c1 = t2.c1
) a
join t3 on t3.c1 = a.c1

etc...


third optimization

if option 2 helps, you may want to create indexed views and work from those instead of directly from the tables.


fourth optimization

don't use mysql. unless you have a team of dbas constantly monitoring for performance and tweaks, you will run into bad times with mysql. mysql is fine and fast with simple things, but starts sucking very badly if you do anything moderately complex. 4 years ago, i migrated from mysql to sql server express and my 10 minute queries took <2 sec with the same tables, indices, and queries...

if you want open source, postgres is much smarter than mysql as well


Create a view that incorporates the first 3 tables that is indexed on the v.key, u.val fields. Then run the query off of the 4th table and the view. Make sure the indices are built on the view before running.

DISTINCT is often a bad friend. Try to replace it with a GROUP BY. Like this :

SELECT sub.key, sub.val
FROM (
    SELECT 
      v.key,
      u.val
    FROM
      ConnectedVertices AS c
      JOIN VertexDictionary  AS u ON (u.x, u.y  ) = (c.tail_x, c.tail_y)
      JOIN VertexDictionary  AS v ON (v.x, v.y  ) = (c.head_x, c.head_y)
      JOIN SpecialKeys       AS k ON (k.x, k.key) = (u.x, u.key)
    WHERE (v.x = @X)
) AS sub
GROUP BY sub.key, sub.val

UPDATE:

Then try the following query which forces the indexes to use:

SELECT DISTINCT
  v.key,
  u.val
FROM
  ConnectedVertices AS c USE INDEX (fx,rx)
  JOIN VertexDictionary  AS u USE INDEX (primary) ON (u.x, u.y  ) = (c.tail_x, c.tail_y) 
  JOIN VertexDictionary  AS v USE INDEX (primary) ON (v.x, v.y  ) = (c.head_x, c.head_y)
  JOIN SpecialKeys       AS k USE INDEX (primary) ON (k.x, k.key) = (u.x, u.key)
WHERE (v.x = @X)

If it still not better, try this one :

SELECT DISTINCT
  v.key,
  u.val
FROM
       ConnectedVertices AS c
  JOIN VertexDictionary  AS u ON (u.x=c.tail_x) AND (u.y=c.tail_y)
  JOIN VertexDictionary  AS v ON (v.x=@X) AND (v.y=c.head_y)
  JOIN SpecialKeys       AS k ON (k.x=u.x) AND (k.key=u.key)
WHERE
  v.x = @X

i don't think that forcing uses of specifique indexes is a good think. the Mysql optimiser has often good estimations.

do you have an index on v.x ?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!