optimized way to get row count from a query contains large amount of data

问题

i am using the below query to return rowcount for paging, it works fine but take very long to return, because all of the table have millions of records. currently its taking 7 sec to return rowcount, can anyone help me in this to return it fast.

i have also tried same query with #table and @table both are slow. query is

WITH cte_rowcount 
     AS (SELECT p.policyid 
         FROM   resident (nolock) r 
                INNER JOIN resident_policy (nolock) rp 
                        ON r.residentid = rp.residentid 
                INNER JOIN policy (nolock) p 
                        ON p.policyid = rp.policyid 
                --INNER JOIN PolicySource (NOLOCK) psourse ON p.PolicySourceID = psourse.PolicySourceId 
                INNER JOIN policy_locations (nolock) pl 
                        ON pl.policyid = p.policyid 
                INNER JOIN location (nolock) l 
                        ON pl.locationid = l.locationid 
                --INNER JOIN Policy_Status (NOLOCK) ps ON ps.PolicyStatusId = p.PolicyStatusId 
                INNER JOIN property (nolock) pr 
                        ON pr.propertyid = l.propertyid 
         --INNER JOIN dbo.States (NOLOCK) s ON s.StateId = pr.StateId 
         WHERE  r.primary_resident = 0x1 
                AND ( ( @ResidentFirstName IS NULL ) 
                       OR R.firstname LIKE @ResidentFirstName + '%' ) 
                AND ( ( @ResidentLastName IS NULL ) 
                       OR R.firstname LIKE @ResidentLastName + '%' ) 
                AND ( @PropertyAddress IS NULL 
                       OR pr.address LIKE @PropertyAddress + '%' ) 
                AND ( @Policynumber IS NULL 
                       OR p.policynumber LIKE @Policynumber + '%' ) 
                AND ( @LocationAddress IS NULL 
                       OR l.address2 LIKE @LocationAddress + '%' ) 
                AND ( @City IS NULL 
                       OR pr.city LIKE @City + '%' ) 
                AND ( @ZipCode IS NULL 
                       OR pr.zipcode = @ZipCode ) 
                AND ( @StateId IS NULL 
                       OR pr.stateid = @StateId ) 
                AND ( @PolicyStatusId IS NULL 
                       OR p.policystatusid = @PolicyStatusId )) 
SELECT @rowcount = Count(*) 
FROM   cte_rowcount

回答1:

I'd say to look at the indexes, but it probably won't help much, because a) you probably did it already, and b) you can get no seeks with this kind of a query, only scans.

The idea is to get rid of these ORs and allow the optimizer to produce a sound plan.

There are two options.

Don't know which version of SQL Server is in question, but if it's SQL 2008 SP1 CU5 (10.0.2746) or later, or SQL 2008 R2 CU1 (10.50.1702) or later, or anything newer than that, add an option (recompile) to the query. This should produce much better plan, using seeks on relevant indexes.

This will, however, add some recompile overhead to every execution, so maybe the second option is better.

You can rewite the query into dynamic one, and elliminate the NULL parameters before optimizer even see the query. I tried to rewrite your query, don't have your data so can't test it, and there may be some errors in it, but you'll get my intention nevertheless. And I had to guess the datatypes. (BTW, is there a specific reason for SELECT p.policyid?)

Here it is:

declare @qry nvarchar(4000), @prms nvarchar(4000);
set @qry = N'
SELECT count(*)
         FROM   resident (nolock) r 
                INNER JOIN resident_policy (nolock) rp 
                        ON r.residentid = rp.residentid 
                INNER JOIN policy (nolock) p 
                        ON p.policyid = rp.policyid 
                INNER JOIN policy_locations (nolock) pl 
                        ON pl.policyid = p.policyid 
                INNER JOIN location (nolock) l 
                        ON pl.locationid = l.locationid 
                INNER JOIN property (nolock) pr 
                        ON pr.propertyid = l.propertyid 
         WHERE  r.primary_resident = 0x1 '
if @ResidentFirstName IS NOT NULL
    set @qry = @qry + ' AND R.firstname LIKE @ResidentFirstName + ''%'''  
if @ResidentLastName IS NOT NULL 
    set @qry = @qry + ' AND R.firstname LIKE @ResidentLastName + ''%'''
if @PropertyAddress IS NOT NULL 
    set @qry = @qry + ' AND pr.address LIKE @PropertyAddress + ''%''' 
if @Policynumber IS NOT NULL 
    set @qry = @qry + ' AND p.policynumber LIKE @Policynumber + ''%''' 
if @LocationAddress IS NOT NULL 
    set @qry = @qry + ' AND l.address2 LIKE @LocationAddress + ''%''' 
if @City IS NOT NULL 
    set @qry = @qry + ' AND pr.city LIKE @City + ''%''' 
if @ZipCode IS NOT NULL 
    set @qry = @qry + ' AND pr.zipcode = @ZipCode'
if @StateId IS NOT NULL 
    set @qry = @qry + ' AND pr.stateid = @StateId'
if @PolicyStatusId IS NOT NULL 
    set @qry = @qry + ' AND p.policystatusid = @PolicyStatusId'


set @prms = N'@PolicyStatusId int, @StateId int, @ZipCode int,
@City varchar(50), @LocationAddress varchar(50), @Policynumber varchar(50), 
@PropertyAddress varchar(50), @ResidentLastName varchar(50), @ResidentFirstName varchar(50)'

exec sp_executesql 
@qry, 
@prms,
@PolicyStatusId = @PolicyStatusId, @StateId = @StateId, @ZipCode = @ZipCode,
@City = @City, @LocationAddress = @LocationAddress, 
@Policynumber = @Policynumber, @PropertyAddress = @PropertyAddress, 
@ResidentLastName = @ResidentLastName, @ResidentFirstName = @ResidentFirstName

If you chect the execution plan you'll see the index seeks, provided you have nonclustered indexes on WHERE and JOIN columns.

Moreover, the plan will be cached, one for each combination of parameters.

回答2:

This is hard to answer because with huge bulk of data many things could happen.

In term of join, this should perform well. If this query is just here to perform a count, then I can just suggest you to do it directly SELECT count('x') without CTE and without (nolock).

SELECT @rowcount = count('x') as rc
FROM   
    resident r 
    INNER JOIN resident_policy rp 
        ON r.residentid = rp.residentid 
    INNER JOIN policy p 
        ON p.policyid = rp.policyid 
    INNER JOIN policy_locations pl 
        ON pl.policyid = p.policyid 
    INNER JOIN location l 
        ON pl.locationid = l.locationid 
    INNER JOIN property pr 
        ON pr.propertyid = l.propertyid 
WHERE  
    r.primary_resident = 0x1 
    AND ( ( @ResidentFirstName IS NULL ) 
        OR R.firstname LIKE @ResidentFirstName + '%' ) 
    AND ( ( @ResidentLastName IS NULL ) 
        OR R.firstname LIKE @ResidentLastName + '%' ) 
    AND ( @PropertyAddress IS NULL 
        OR pr.address LIKE @PropertyAddress + '%' ) 
    AND ( @Policynumber IS NULL 
        OR p.policynumber LIKE @Policynumber + '%' ) 
    AND ( @LocationAddress IS NULL 
        OR l.address2 LIKE @LocationAddress + '%' ) 
    AND ( @City IS NULL 
        OR pr.city LIKE @City + '%' ) 
    AND ( @ZipCode IS NULL 
        OR pr.zipcode = @ZipCode ) 
    AND ( @StateId IS NULL 
        OR pr.stateid = @StateId ) 
    AND ( @PolicyStatusId IS NULL 
        OR p.policystatusid = @PolicyStatusId )

If this CTE is used for both rowcount and retrieve data from CTE be sure that you are retrieve only data for the page in question (only 20 elements with a ROWCOUNT() as RC and RC > 0 AND RC <= 20)

In database side, you can check if you have indexes for all of your join clause. It looks like there is only PK so they already have indexes. Be sure, you have index on joined columns.

If you continue to have trouble, use "execution plan in real time" fonction to see what the hell is going on.

LIKE condition can be a performance killer depending on the text size and database content. You can think about COLLECTION to store your texts and have some gain on text comparison.

回答3:

There are some general instructions:

Create Non clustered Index on All of your foreign-key columns
Create Non clustered Index on primary_resident columns
Include Actual Execution Plan when you run your query and see which part is wasting time
Put Statements that are more likely to be false at first
When you run your query SQL server will suggest you some hints, try them too

来源：https://stackoverflow.com/questions/23307532/optimized-way-to-get-row-count-from-a-query-contains-large-amount-of-data

标签

sql

sql-server

tsql

rowcount