Slow two-table query in SQL Server

问题

The application I work on generates an SQL query somewhat like this:

Select 
    VISIT_VIEW.VISIT_ID, VISIT_VIEW.PATIENT_ID, VISIT_VIEW.MRN_ID, 
    VISIT_VIEW.BILL_NO, INSURANCE.INS_PAYOR'
FROM 
    'VISIT_VIEW 
LEFT JOIN 
    INSURANCE ON VISIT_VIEW.visit_id = INSURANCE._fk_visit '
WHERE 
    'VISIT_VIEW.VISIT_ID IN (1002, 1003, 1005, 1006, 1007, 1008, 1010, 1011, <...>, 1193, 1194, 1195, 1196, 1197, 1198, 1199)'

The <...> represents a long list of ids. The size of the list depends on the results of a previous query, and in turn on the parameters selected to generate that query.

The list of IDs can be anywhere from 100 items long to above 2000.

The INSURANCE table is large, over 9 million rows. The visit table is also large, but not quite as large.

As the number of IDs goes up there is a fairly sharp increase from a duration of less than a second to over 15 minutes. The increase starts somewhere around 175 ids.

If the parameters used to generate the query are changed so that the INS_PAYOR column is not selected, and thus there is no left join, the query runs in less than a second, even with over 2000 items in the list of IDs.

The execution plan shows that 97% of the query time is devoted to a clustered seek on the INSURANCE table.

How can I rework this query to get the same results with a less horrific delay?

Do remember that the SQL is being generated by code, not by hand. It is generated from a list of fields (with knowledge of which field belongs to which table) and a list of IDs in the primary table to check. I do have access to the code that does the query generation, and can change it provided that the ultimate results of the query are exactly the same.

Thank you

回答1:

The <...> represents a long list of ids. The size of the list depends on the results of a previous query

Don't do that.

Do this:

SELECT <...>
FROM VISIT_VIEW 
INNER JOIN (
    <previous query goes here>
) t on VISIT_VIEW.VISIT_ID = t.<ID?>
LEFT JOIN INSURANCE ON VISIT_VIEW.visit_id=INSURANCE._fk_visit

回答2:

See if you see any improvements using the following...

IF OBJECT_ID('tempdb..#VisitList', 'U') IS NOT NULL 
DROP TABLE #VisitList;

CREATE TABLE #VisitList (
    VISIT_ID INT NOT NULL PRIMARY KEY
    );

INSERT #VisitList (VISIT_ID) VALUES (1002),(1003),(1005),(1006),(1007),(1008),(1010),(1011),(<...>),(1193),(1194),(1195),(1196),(1197),(1198),(1199);

SELECT 
    vv.VISIT_ID, 
    vv.PATIENT_ID, 
    vv.MRN_ID, 
    vv.BILL_NO, 
    ix.INS_PAYOR
FROM 
    VISIT_VIEW vv
    JOIN #VisitList vl
        ON vv.VISIT_ID = vl.VISIT_ID
    CROSS APPLY (
                SELECT TOP 1
                     i.INS_PAYOR
                FROM 
                    INSURANCE i
                WHERE 
                    vv.visit_id=i._fk_visit
                ) ix;

来源：https://stackoverflow.com/questions/45766195/slow-two-table-query-in-sql-server

标签

sql-server

query-performance

generated-code