How to convert SQL from old Informix-style to ANSI-style?

喜夏-厌秋 提交于 2019-12-11 08:06:40

问题


I have some queries written in Informix-style SQL. Specifically, this query selects the items in a customer's order. (I've simplified the table structure somewhat, though I kept the part that is problematic.)

SELECT ordi.line_no, ordi.item_code, ordi.desc, ordi.price,
    shpi.location, shpi.status, shpi.ship_code,
    box.box_no, box.tracking_no, shpc.ship_co, mfr.mfr_name,
    sum(shpi.ship_qty), sum(shpi.net_cost)
FROM order_items ordi, ship_items shpi, OUTER ship_boxes box,
    shipping_companies shpc, OUTER (inventory invt, brand, manufacturer mfr)
WHERE ordi.order_id = ?
    AND shpi.order_id = ordi.order_id AND shpi.line_no = ordi.line_no
    AND box.order_id = ordi.order_id AND box.box_no = shpi.box_no
    AND shp.shipper_code = shpi.shipper_code
    AND invt.item_code = ordi.item_code
        AND brand.brand_no = invt.brand_no
        AND mfr.mfr_code = brand.mfr_code
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ORDER BY ordi.line_no ASC;

(The reason inventory is joined by OUTER is because a certain class of items is stored in a different inventory table. The OUTER on ship_boxes is for items that were not packed yet.)

I'm rewrote it with standard, ANSI-style JOIN's. Here is what I got:

SELECT ordi.line_no, ordi.item_code, ordi.desc, ordi.price, shpi.location,
    shpi.status, shpi.ship_code, box.box_no, box.tracking_no, shpc.ship_co,
    mfr.mfr_name, sum(shpi.ship_qty), sum(shpi.net_cost)
FROM order_items ordi
    JOIN ship_items shpi ON shpi.order_id = ordi.order_id
        AND shpi.line_no = ordi.line_no
    LEFT JOIN ship_boxes box ON box.order_id = ordi.order_id
        AND box.box_no = shpi.box_no
    JOIN shipping_companies shpc ON shpc.shipper_code = shpi.shipper_code
    LEFT JOIN (inventory invt
        JOIN brand ON brand.brand_no = invt.brand_no
        JOIN manufacturer mfr ON mfr.mfr_code = brand.mfr_code
        ) ON invt.item_code = ordi.item_code
WHERE ordi.order_id = ?
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ORDER BY ordi.line_no ASC;

The result set is exactly the same, but the performance hit is nearly 2 orders of magnitude. For an order with 50 items, the first query takes about 50 milliseconds, while the second takes about 5 seconds. Running an Explain gives a cost of 25 to the first query, and a cost of 14403 to the second. I was able to pin down the difference to the complex join of inventory: the Informix-style query performed it as 3 INDEX PATH / NESTED LOOP JOIN's, each having cost of 1; the ANSI JOIN's were performed as a SEQUENTIAL SCAN, with cost of 383 at that point, adding up to over 14K points.

It seems that the ANSI JOIN's work on the entire inventory / brand / manufacturer table, which is then LEFT JOIN'ed to the order items. The Informix OUTER (...) is able to work on the small selection of that table that I asked for (the items in the order).

What am I doing wrong? Is there a way to write the query ANSI-style that won't give me that performance hit? If I must, I'll go back to the Informix-style JOIN's, but I am really hoping there is another way.

Thank you.

EDIT: Here are the results from SET EXPLAIN:

  1. Original query: Estimated Cost: 18
  2. My rewrite (explicit JOIN's): Estimated Cost: 15629
  3. @HartCO's suggestion (unbundle inventory section): Estimated Cost: 18 (but will the data be the same? Why isn't that like OUTER inventory, brand, manufacturer?)

回答1:


You need to unbundle your Inventory join section and change those to LEFT JOIN:

SELECT ordi.line_no     , ordi.item_code     , ordi.DESC        , ordi.price
     , shpi.location    , shpi.STATUS        , shpi.ship_code   , box.box_no
     , box.tracking_no  , shpc.ship_co       , mfr.mfr_name     
     , sum(shpi.ship_qty)
     , sum(shpi.net_cost)
FROM order_items ordi
    JOIN ship_items shpi ON  shpi.order_id = ordi.order_id   
        AND shpi.line_no = ordi.line_no
    LEFT JOIN ship_boxes box ON box.order_id = ordi.order_id
        AND box.box_no = shpi.box_no
    LEFT JOIN shipping_companies shpc ON shpc.shipper_code = box.shipper_code
    LEFT JOIN inventory invt ON invt.item_code = ordi.item_code
    LEFT JOIN brand ON brand.brand_no = invt.brand_no
    LEFT JOIN manufacturer mfr ON mfr.mfr_code = brand.mfr_code    
WHERE ordi.order_id = ?
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ORDER BY ordi.line_no ASC;

Note: I only have a SQL Server instance to test on, but I see big difference in the execution plan, my query shows a Nested Loops (Left Outer Join) which gets executed once, while yours shows Nested Loops (Inner Join) that gets executed 3 times. Certainly seems like the culprit.

Your LEFT JOIN ship_boxes was effectively an INNER JOIN because you used JOIN shipping_companies to join to that table. If the results from the above query aren't as desired you should change both from LEFT JOIN to JOIN.




回答2:


My decomposition of the original query is close but with some significant differences.

First, shipping_company is definitely an inner join. This makes sense as this seems to be a readout of the items that have at least been sent to the shipper. The shipper may not have yet loaded everything into boxes, so from ship_boxes on down is an outer join.

One outer join that makes no sense is inventory. Could items that were not in inventory have been sent to the shipper? Maybe I'm reading this relationship wrong, but in the meantime, I changed it to an inner -- also the brand and manufacturer which followed it in the join chain. That left ship_boxes as the sole surviving outer join.

Another thing that was curious was the double relationship of ship_boxes to both ship_items and order_items. This locks down an entire box to one order. If the entire order was a pack of playing cards, there's going to be a lot of wasted space in that box. On the assumption that one box could quite easily contain more than one order, I eliminated that connection. Now, I realize a "ship_box" does not necessarily have to be an entire shipping container. It could be a cardboard box sized just to fit the order or part of order. That makes no difference. The order_id connected to the box can be had from ship_items. Having a duplicate order_id field in ship_boxes is an unneeded redundancy that, as far as I could tell, makes no difference in the execution plan.

My final query, using SQL Server:

select  ordi.line_no, ordi.item_code, ordi.item_desc, ordi.price,
        shpi.location, shpi.status, shpi.ship_code,
        box.box_no, box.tracking_no, shpc.ship_co, mfr.mfr_name,
        sum(shpi.ship_qty), sum(shpi.net_cost)
from    order_items ordi
join    ship_items shpi
    on  shpi.order_id = ordi.order_id
    and shpi.line_no = ordi.line_no
left join ship_boxes box
    on  box.box_no = shpi.box_no --AND box.order_id = ordi.order_id
join    shipping_companies shpc
    on  shpc.shipper_code = shpi.shipper_code
join    inventory invt
    on  invt.item_code = ordi.item_code
join    brand
    on  brand.brand_no = invt.brand_no
join    manufacturer mfr
    on  mfr.mfr_code = brand.mfr_code
where ordi.order_id = 1
group by ordi.line_no, ordi.item_code, ordi.item_desc, ordi.price,
        shpi.location, shpi.status, shpi.ship_code,
        box.box_no, box.tracking_no, shpc.ship_co, mfr.mfr_name
order by ordi.line_no;

I created the tables and loaded them with some test data. The result set is correct and the execution plan looks simple and right what I would expect it to be.

Now, if my assumption about inventory is wrong, changing that chain back to outer joins really doesn't change the execution plan.



来源:https://stackoverflow.com/questions/28485239/how-to-convert-sql-from-old-informix-style-to-ansi-style

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!