This is my query:
SELECT `products`.*, SUM(orders.total_count) AS revenue,
SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars
FROM
One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, ( SELECT ROUND(AVG(r.stars))
FROM `product_reviews` r
WHERE r.product_id = p.id
) AS avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars)
.
If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, s.avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
LEFT
JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
, r.product_id
FROM `product_reviews` r
GROUP BY r.product_id
) s
ON s.product_id = p.id
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.
I apologize if my use of the term "semi-cartesian" was misleading or confusing.
The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).
For example, given three rows in reviews
for product_id 101, and two rows in orders
for product_id 101, e.g.:
REVIEWS
pid stars text
--- ----- --------------
101 4.5 woo hoo perfect
101 3 ehh
101 1 totally sucked
ORDERS
pid date qty
--- ----- ---
101 1/13 100
101 1/22 7
Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:
id date qty stars text
--- ---- ---- ---- ------------
101 1/13 100 4.5 woo hoo perfect
101 1/13 100 3 ehh
101 1/13 100 1 totally sucked
101 1/22 7 4.5 woo hoo perfect
101 1/22 7 3 ehh
101 1/22 7 1 totally sucked
Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.