Two left joins gives me untrue data(double data?) with MySQL

后端 未结 3 1382
梦谈多话
梦谈多话 2021-01-26 06:56

This is my query:

SELECT `products`.*, SUM(orders.total_count) AS revenue,
    SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars 
FROM          


        
3条回答
  •  没有蜡笔的小新
    2021-01-26 07:33

    One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.

    SELECT p.*
         , SUM(o.total_count) AS revenue
         , SUM(o.quantity) AS qty
         , ( SELECT ROUND(AVG(r.stars))
               FROM `product_reviews` r
              WHERE r.product_id = p.id 
           ) AS avg_stars
      FROM `products` p
      LEFT
      JOIN `orders` o
        ON o.product_id = p.id
       AND o.status IN ('delivered','new')
     GROUP BY p.id
     ORDER BY p.id DESC
     LIMIT 10
     OFFSET 0
    

    This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars).

    If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)

    SELECT p.*
         , SUM(o.total_count) AS revenue
         , SUM(o.quantity) AS qty
         , s.avg_stars
      FROM `products` p
      LEFT
      JOIN `orders` o
        ON o.product_id = p.id
       AND o.status IN ('delivered','new')
      LEFT
      JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
                  , r.product_id
               FROM `product_reviews` r
              GROUP BY r.product_id 
           ) s
        ON s.product_id = p.id
     GROUP BY p.id
     ORDER BY p.id DESC
     LIMIT 10
     OFFSET 0
    

    Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.

    I apologize if my use of the term "semi-cartesian" was misleading or confusing.

    The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).

    For example, given three rows in reviews for product_id 101, and two rows in orders for product_id 101, e.g.:

    REVIEWS
    pid  stars text
    ---  ----- --------------
    101  4.5   woo hoo perfect
    101  3     ehh
    101  1     totally sucked
    
    
    ORDERS
    pid  date   qty 
    ---  -----  ---
    101  1/13   100
    101  1/22   7
    

    Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:

    id   date   qty   stars text
    ---  ----   ----  ----  ------------
    101  1/13   100   4.5   woo hoo perfect
    101  1/13   100   3     ehh
    101  1/13   100   1     totally sucked
    101  1/22   7     4.5   woo hoo perfect
    101  1/22   7     3     ehh
    101  1/22   7     1     totally sucked
    

    Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.

提交回复
热议问题