Group by and group concat , optimization mysql query without using main pk

空扰寡人 提交于 2019-12-05 07:18:59

Remove HAVING 1=1; the Optimizer may not be smart enough to ignore it. Please provide EXPLAIN SELECT (not in html) to see what the Optimizer is doing.

It seems wrong to have a composite PK in this case: PRIMARY KEY (id, payment_type_id). Please justify it.

Please explain the meaning of status or the need for DOUBLE: status DOUBLE

It will take some effort to figure out why the query is so slow. Let's start by tossing the normalization parts, such as dates and event name and currency. That is whittle down the query to enough to find the desired rows, but not the details on each row. If it is still slow, let's debug that. If it is 'fast', then add back on the other stuff, one by one, to find out what is causing a performance issue.

Is just id the PRIMARY KEY of each table? Or are there more exceptions (like payment)?

It seems 'wrong' to specify a value for question.var, but then use LEFT to imply that it is optional. Please change all LEFT JOINs to INNER JOINs unless I am mistaken on this issue.

Are any of the tables (perhaps submission_entry and event_date_product) "many-to-many" mapping tables? If so, then follow the tips here to get some performance gains.

When you come back please provide SHOW CREATE TABLE for each table.

Guided by the strategies below,

  • pre-evaluating agregations onto temporary tables
  • placing payment at the top - since this seems to be the most deterministic
  • grouping joins - enforcing to the query optimizer the tables relationship

i present a revised version of your query:

-- -----------------------------------------------------------------------------
-- Summarization of order_item
-- -----------------------------------------------------------------------------

drop temporary table if exists _ord_itm_sub_tot;

create temporary table _ord_itm_sub_tot(
    primary key (payment_id)
)
SELECT
    payment_id,
    --
    COUNT(
        DISTINCT
            CASE
                WHEN(
                        `order_item`.status > 0 OR
                        (
                                `order_item`.status       != -1 AND
                                `order_item`.status       >= -2 AND
                                `payment`.payment_type_id != 8  AND
                                payment.make_order_free = 1
                            )
                    ) THEN `order_item`.id
                      ELSE NULL
            END
    ) AS qty,
    --
    SUM(order_item.sub_total) sub_total
FROM
    order_item
        inner join payment
        on payment.id = order_item.payment_id    
where order_item.status > 0.0 OR order_item.status = -2.0
group by payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of payment_refunds
-- -----------------------------------------------------------------------------

drop temporary table if exists _pay_ref_tot;

create temporary table _pay_ref_tot(
    primary key(payment_id)
)
SELECT
    payment_refunds.payment_id  AS `payment_id`,
    sum(`payment_refund`.total) AS `refunds_total`
FROM
    `payment_refunds`
        INNER JOIN `payment` AS `payment_refund`
        ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of submission_entry
-- -----------------------------------------------------------------------------

drop temporary table if exists _sub_ent;

create temporary table _sub_ent(
    primary key(form_submission_id)
)
select 
    submission_entry.form_submission_id,
    GROUP_CONCAT(
        DISTINCT (
            CASE WHEN coalesce(submission_entry.text, '') THEN ' '
                                                          ELSE submission_entry.text
            END
        )
        ORDER BY question.var
        DESC SEPARATOR 0x1D
    ) AS buyer
from 
    submission_entry
        LEFT JOIN question
        ON(
                question.id = submission_entry.question_id
            AND question.var IN ('name', 'email')
        )
group by submission_entry.form_submission_id;

-- -----------------------------------------------------------------------------
-- The result
-- -----------------------------------------------------------------------------

SELECT SQL_NO_CACHE
    `payment`.`id`          AS id,
    `order_item`.`order_id` AS order_id,
    --
    _sub_ent.buyer,
    --
    event.name AS event,
    --
    _ord_itm_sub_tot.qty,
    --
    payment.currency AS `currency`,
    --
    _ord_itm_sub_tot.sub_total,
    --
    CASE
        WHEN payment.make_order_free = 1 THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
                                         ELSE ROUND(payment.total, 2)
    END AS 'total',
    --
    `payment_type`.`name`   AS payment_type,
    `payment_status`.`name` AS status,
    `payment_status`.`id`   AS status_id,
    --
    DATE_FORMAT(
        CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
        '%Y-%m-%d %H:%i'
    ) AS 'created',
    --
    `user`.`name` AS 'agent',
    event.id      AS event_id,
    payment.checked,
    --
    DATE_FORMAT(CONVERT_TZ(payment.checked_date,  '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS checked_date,
    DATE_FORMAT(CONVERT_TZ(payment.complete_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS `complete date`,
    --
    `payment`.`delivery_status` AS `delivered`
FROM
    `payment`
        INNER JOIN(
            `order_item`
                INNER JOIN event
                ON event.id = order_item.event_id
        )
        ON `order_item`.`payment_id` = payment.id
        --
        inner join _ord_itm_sub_tot
        on _ord_itm_sub_tot.payment_id = payment.id
        --
        LEFT JOIN _pay_ref_tot
        on _pay_ref_tot.payment_id = `payment`.id
        --
        INNER JOIN payment_status ON payment_status.id = payment.status
        INNER JOIN payment_type   ON payment_type.id   = payment.payment_type_id
        LEFT  JOIN user           ON user.id           = payment.completed_by
        --
        LEFT JOIN _sub_ent
        on _sub_ent.form_submission_id = `payment`.`form_submission_id`
WHERE
    1 = 1
AND (payment.status > 0.0 OR payment.status = -3.0)
AND (order_item.status > 0.0 OR order_item.status = -2.0)
ORDER BY `order_item`.`order_id` DESC
LIMIT 10

The query from your question present aggregated functions without explicit groupings... this is pretty awkward and in my solution I try to devise aggregations that 'make sense'.

Please, run this version and tell us your findings.

Be, please, very careful not just on the running statistics, but also on the summarization results.

(The tables and query are too complex for me to do the transformation for you. But here are the steps.)

  1. Reformulate the query without any mention of refunds. That is, remove the derived table and the mention of it in the complex CASE.
  2. Debug and time the resulting query. Keep the GROUP BY order_item ORDER BY order_item DESC LIMIT 10 and do any other optimizations already suggested. In particular, get rid of HAVING 1=1 since it is in the way of a likely optimization.
  3. Make the query from step #2 be a 'derived table'...

Something like:

SELECT lots of stuff
    FROM ( query from step 2 ) AS step2
    LEFT JOIN ( ... ) AS refunds  ON step2... = refunds...
    ORDER BY step2.order_item DESC

The ORDER BY is repeated, but neither the GROUP BY, nor the LIMIT need be repeated.

Why? The principle here is...

Currently, it is going into the refunds correlated subquery thousands of times, only to toss it all but 10 times. The reformulation cuts that back to only 10 times.

(Caveat: I may have missed a subtlety preventing this reformulation from working as I presented it. If it does not work, see if you can make the 'principle' help you anyway.)

Here is the minimum you should do each time you see a query with a lot of joins and pagination: you should select those 10 (LIMIT 10) ids that you group by from the first table (order_item) with as minimum joins as possible and then join the ids back to the first table and make all other joins. That way you will not move around in temporary tables all those thousands of columns and rows that you do not need to display.

  1. You look at the inner joins and WHERE conditions, GROUP BYs and ORDER BYs to see whether you need any other tables to filter out rows, group or order ids from the first table. In your case, it doesn't seem you need any joins, except for payment.

  2. Now you write the query to select those ids:

    SELECT o.order_id, o.payment_id
    FROM order_item o
    JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
    WHERE order_item.status > 0.0 OR order_item.status = -2.0
    ORDER BY order_id DESC
    LIMIT 10
    

    If there might be several payments for a single order, you should use GROUP BY order_id DESC instead of ORDER BY. To make the query work quicker you need a BTREE index on status column for order_item table, or even a composite index on (status, payment_id).

  3. Now, when you are sure that the ids are those that you expected, you make all other joins:

    SELECT order_item.order_id,
      `payment`.`id`,
      GROUP_CONCAT ... -- and so on from the original query
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids
    JOIN order_item ON ids.order_id = order_item.order_id
    JOIN payment ON ids.payment_id = payment.id
    LEFT JOIN ( ... -- and so on
    

The idea is that you significantly lower the temporary tables you need to process. Now every row selected by the joins will be used in the result set.


UPD1: Another thing is that you should simplify the aggregation in LEFT JOIN:

SELECT
  sum(payment.total) AS `refunds_total`,
  refs.payment_id  AS `payment_id`
FROM payment_refunds refs
JOIN payment ON payment.id = refs.payment_id_refund
GROUP BY refs.payment_id

or even replace the LEFT JOIN with a correlated subquery, since the correlation will be executed only for those 10 rows (make sure, you use this whole query with three columns as the subquery, otherwise, the correlation will be computed for each row in the resulting join before the GROUP BY):

SELECT
      ids.order_id,
      ids.payment_id,
      (SELECT SUM(p.total) 
       FROM payment_refunds refs 
       JOIN payment p 
         ON refs.payment_id_refund = p.id
       WHERE refs.payment_id = ids.payment_id
       ) as refunds_total
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids

You will also need to an index (payment_id, payment_id_refund) on payment_refunds and you can even try a covering index (payment_id, total) on payment as well.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!