问题
I have some queries written in Informix-style SQL. Specifically, this query selects the items in a customer's order. (I've simplified the table structure somewhat, though I kept the part that is problematic.)
SELECT ordi.line_no, ordi.item_code, ordi.desc, ordi.price,
shpi.location, shpi.status, shpi.ship_code,
box.box_no, box.tracking_no, shpc.ship_co, mfr.mfr_name,
sum(shpi.ship_qty), sum(shpi.net_cost)
FROM order_items ordi, ship_items shpi, OUTER ship_boxes box,
shipping_companies shpc, OUTER (inventory invt, brand, manufacturer mfr)
WHERE ordi.order_id = ?
AND shpi.order_id = ordi.order_id AND shpi.line_no = ordi.line_no
AND box.order_id = ordi.order_id AND box.box_no = shpi.box_no
AND shp.shipper_code = shpi.shipper_code
AND invt.item_code = ordi.item_code
AND brand.brand_no = invt.brand_no
AND mfr.mfr_code = brand.mfr_code
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ORDER BY ordi.line_no ASC;
(The reason inventory is joined by OUTER
is because a certain class of items is stored in a different inventory table. The OUTER
on ship_boxes is for items that were not packed yet.)
I'm rewrote it with standard, ANSI-style JOIN
's. Here is what I got:
SELECT ordi.line_no, ordi.item_code, ordi.desc, ordi.price, shpi.location,
shpi.status, shpi.ship_code, box.box_no, box.tracking_no, shpc.ship_co,
mfr.mfr_name, sum(shpi.ship_qty), sum(shpi.net_cost)
FROM order_items ordi
JOIN ship_items shpi ON shpi.order_id = ordi.order_id
AND shpi.line_no = ordi.line_no
LEFT JOIN ship_boxes box ON box.order_id = ordi.order_id
AND box.box_no = shpi.box_no
JOIN shipping_companies shpc ON shpc.shipper_code = shpi.shipper_code
LEFT JOIN (inventory invt
JOIN brand ON brand.brand_no = invt.brand_no
JOIN manufacturer mfr ON mfr.mfr_code = brand.mfr_code
) ON invt.item_code = ordi.item_code
WHERE ordi.order_id = ?
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ORDER BY ordi.line_no ASC;
The result set is exactly the same, but the performance hit is nearly 2 orders of magnitude. For an order with 50 items, the first query takes about 50 milliseconds, while the second takes about 5 seconds. Running an Explain gives a cost of 25 to the first query, and a cost of 14403 to the second. I was able to pin down the difference to the complex join of inventory: the Informix-style query performed it as 3 INDEX PATH
/ NESTED LOOP JOIN
's, each having cost of 1; the ANSI JOIN
's were performed as a SEQUENTIAL SCAN
, with cost of 383 at that point, adding up to over 14K points.
It seems that the ANSI JOIN
's work on the entire inventory / brand / manufacturer table, which is then LEFT JOIN
'ed to the order items. The Informix OUTER (...)
is able to work on the small selection of that table that I asked for (the items in the order).
What am I doing wrong? Is there a way to write the query ANSI-style that won't give me that performance hit? If I must, I'll go back to the Informix-style JOIN
's, but I am really hoping there is another way.
Thank you.
EDIT: Here are the results from SET EXPLAIN
:
- Original query: Estimated Cost: 18
- My rewrite (explicit
JOIN
's): Estimated Cost: 15629 - @HartCO's suggestion (unbundle inventory section): Estimated Cost: 18 (but will the data be the same? Why isn't that like
OUTER inventory, brand, manufacturer
?)
回答1:
You need to unbundle your Inventory
join section and change those to LEFT JOIN
:
SELECT ordi.line_no , ordi.item_code , ordi.DESC , ordi.price
, shpi.location , shpi.STATUS , shpi.ship_code , box.box_no
, box.tracking_no , shpc.ship_co , mfr.mfr_name
, sum(shpi.ship_qty)
, sum(shpi.net_cost)
FROM order_items ordi
JOIN ship_items shpi ON shpi.order_id = ordi.order_id
AND shpi.line_no = ordi.line_no
LEFT JOIN ship_boxes box ON box.order_id = ordi.order_id
AND box.box_no = shpi.box_no
LEFT JOIN shipping_companies shpc ON shpc.shipper_code = box.shipper_code
LEFT JOIN inventory invt ON invt.item_code = ordi.item_code
LEFT JOIN brand ON brand.brand_no = invt.brand_no
LEFT JOIN manufacturer mfr ON mfr.mfr_code = brand.mfr_code
WHERE ordi.order_id = ?
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ORDER BY ordi.line_no ASC;
Note: I only have a SQL Server instance to test on, but I see big difference in the execution plan, my query shows a Nested Loops (Left Outer Join)
which gets executed once, while yours shows Nested Loops (Inner Join)
that gets executed 3 times. Certainly seems like the culprit.
Your LEFT JOIN ship_boxes
was effectively an INNER JOIN
because you used JOIN shipping_companies
to join to that table. If the results from the above query aren't as desired you should change both from LEFT JOIN
to JOIN
.
回答2:
My decomposition of the original query is close but with some significant differences.
First, shipping_company
is definitely an inner join. This makes sense as this seems to be a readout of the items that have at least been sent to the shipper. The shipper may not have yet loaded everything into boxes, so from ship_boxes
on down is an outer join.
One outer join that makes no sense is inventory
. Could items that were not in inventory have been sent to the shipper? Maybe I'm reading this relationship wrong, but in the meantime, I changed it to an inner -- also the brand
and manufacturer
which followed it in the join chain. That left ship_boxes
as the sole surviving outer join.
Another thing that was curious was the double relationship of ship_boxes
to both ship_items
and order_items
. This locks down an entire box to one order. If the entire order was a pack of playing cards, there's going to be a lot of wasted space in that box. On the assumption that one box could quite easily contain more than one order, I eliminated that connection. Now, I realize a "ship_box" does not necessarily have to be an entire shipping container. It could be a cardboard box sized just to fit the order or part of order. That makes no difference. The order_id
connected to the box can be had from ship_items
. Having a duplicate order_id
field in ship_boxes
is an unneeded redundancy that, as far as I could tell, makes no difference in the execution plan.
My final query, using SQL Server:
select ordi.line_no, ordi.item_code, ordi.item_desc, ordi.price,
shpi.location, shpi.status, shpi.ship_code,
box.box_no, box.tracking_no, shpc.ship_co, mfr.mfr_name,
sum(shpi.ship_qty), sum(shpi.net_cost)
from order_items ordi
join ship_items shpi
on shpi.order_id = ordi.order_id
and shpi.line_no = ordi.line_no
left join ship_boxes box
on box.box_no = shpi.box_no --AND box.order_id = ordi.order_id
join shipping_companies shpc
on shpc.shipper_code = shpi.shipper_code
join inventory invt
on invt.item_code = ordi.item_code
join brand
on brand.brand_no = invt.brand_no
join manufacturer mfr
on mfr.mfr_code = brand.mfr_code
where ordi.order_id = 1
group by ordi.line_no, ordi.item_code, ordi.item_desc, ordi.price,
shpi.location, shpi.status, shpi.ship_code,
box.box_no, box.tracking_no, shpc.ship_co, mfr.mfr_name
order by ordi.line_no;
I created the tables and loaded them with some test data. The result set is correct and the execution plan looks simple and right what I would expect it to be.
Now, if my assumption about inventory is wrong, changing that chain back to outer joins really doesn't change the execution plan.
来源:https://stackoverflow.com/questions/28485239/how-to-convert-sql-from-old-informix-style-to-ansi-style