Please help me with this query (sql server 2008)

问题

ALTER PROCEDURE ReadNews

 @CategoryID INT,
 @Culture TINYINT = NULL,
 @StartDate DATETIME = NULL,
 @EndDate DATETIME = NULL,
 @Start BIGINT, -- for paging
 @Count BIGINT -- for paging

AS
BEGIN
  SET NOCOUNT ON;  

  --ItemType for news is 0
  ;WITH Paging AS
  (
   SELECT news.ID,
     news.Title,
     news.Description,
     news.Date,
     news.Url,
     news.Vote,
     news.ResourceTitle,
     news.UserID,

     ROW_NUMBER() OVER(ORDER BY news.rank DESC) AS RowNumber, TotalCount = COUNT(*) OVER()

   FROM dbo.News news
   JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
   WHERE itemCat.ItemType = 0 -- news item 
     AND itemCat.CategoryID = @CategoryID
     AND (
       (@StartDate IS NULL OR news.Date >= @StartDate) AND 
       (@EndDate IS NULL OR news.Date <= @EndDate)
      )
     AND news.Culture = @Culture
     and news.[status] = 1

  )  
  SELECT * FROM Paging WHERE RowNumber >= @Start AND RowNumber <= (@Start + @Count - 1)
  OPTION (OPTIMIZE FOR (@CategoryID  UNKNOWN, @Culture UNKNOWN))
END

Here is the structure of News and ItemCategory tables:

CREATE TABLE [dbo].[News](
 [ID] [bigint] NOT NULL,
 [Url] [varchar](300) NULL,
 [Title] [nvarchar](300) NULL,
 [Description] [nvarchar](3000) NULL,
 [Date] [datetime] NULL,
 [Rank] [smallint] NULL,
 [Vote] [smallint] NULL,
 [Culture] [tinyint] NULL,
 [ResourceTitle] [nvarchar](200) NULL,
 [Status] [tinyint] NULL

 CONSTRAINT [PK_News] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [ItemCategory](
 [ID] [bigint] IDENTITY(1,1) NOT NULL,
 [ItemID] [bigint] NOT NULL,
 [ItemType] [tinyint] NOT NULL,
 [CategoryID] [int] NOT NULL,
 CONSTRAINT [PK_ItemCategory] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

This query reads news of a specific category (sport, politics, ...). @Culture parameter specifies the language of news, like 0 (english), 1 (french), etc. ItemCategory table relates a news record to one or more categories. ItemType column in ItemCategory table specifies which type of itemID is there. for now, we have only ItemType 0 indicating that ItemID refers to a record in News table.

Currently, I have the following index on ItemCategory table:

CREATE NONCLUSTERED INDEX [IX_ItemCategory_ItemType_CategoryID__ItemID] ON [ItemCategory] 
(
 [ItemType] ASC,
 [CategoryID] ASC
)
INCLUDE ( [ItemID])

and the following index for News table (suggested by query analyzer):

CREATE NONCLUSTERED INDEX [_dta_index_News_8_1734000549__K1_K7_K13_K15] ON [dbo].[News] 
(
 [ID] ASC,
 [Date] ASC,
 [Culture] ASC,
 [Status] ASC
)

With these indexes, when I execute the query, the query executes in less than a second for some parameters, and for another parameters (e.g. different @Culture or @CategoryID) may take up to 2 minutes! I have used OPTIMIZE FOR (@CategoryID UNKNOWN, @Culture UNKNOWN) to prevent parameter sniffing for @CategoryID and @Culture parameters but seems not working for some parameters.

There are currently around 2,870,000 records in News table and 4,740,000 in ItemCategory table.

Now I greatly appreciate any advice on how to optimize this query or its indexes.

update: execution plan:

(in this image, ItemNetwork is what I referred to as ItemCategory. they are the same)

回答1:

Have you had a look at some of the inbuilt SQL tools to help you with this:

I.e. from the management studio menu:

'Query'->'Display Estimated Execution Plan'
'Query'->'Include Actual Execution Plan'
'Tools'->'Database Engine Tuning Advisor'

回答2:

Shouldn't the OPTION OPTIMIZE clause be part of the inner SQL, rather than of the SELECT on the CTE?

回答3:

You should look at indexing the culture field in the news table, and the itemid and categoryid fields in the item category table. You may not need all these indexes - I would try them one at a time, then in combination until you find something that works. Your existing indexes do not seem to help your query very much.

回答4:

Really need to see the query plan - one thing of note is you put the clustered index for News on News.ID, but it is not an identity field but the FK for the ItemCategory table, this will result in some fragmentation on the news table over time, so it less than ideal.

I suspect the underlying problem is your paging is causing the table to scan.

Updated:

Those Sort's are costing you 68% of the query execution time from the plan, and that makes sense, one of those sorts at least must be to support the ranking function you are using that is based on news.rank desc, but you have no index that can support that ranking natively.

Getting an index in to support that will be interesting, you can try a simple NC index on news.rank first off, SQL may chose to join indexes and avoid the sort, but it will take some experimentation.

回答5:

Try using for ItemCategory table nonclustered index on itemId,categoryId and on News table also nonclustered index on Rank,Culture.

回答6:

I have finally come up with the following indexes which are working great and the stored procedure executes in less than a second. I have just removed TotalCount = COUNT(*) OVER() from the query and I couldn't find any good index for that. Maybe I write a separate stored procedure to calculate the total number of records. I may even decide to use a "more" button like in Twitter and Facebook without pagination buttons.

for news table:

CREATE NONCLUSTERED INDEX [IX_News_Rank_Culture_Status_Date] ON [dbo].[News] 
(
    [Rank] DESC,
    [Culture] ASC,
    [Status] ASC,
    [Date] ASC
)

for ItemNetwork table:

CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_NetworkID] ON ItemNetwork
(
    [ItemID] ASC,
    [NetworkID] ASC
)

I just don't know whether ItemNetwork needs a clustered index on ID column. I am never retrieving a record from this table using the ID column. Do you think it's better to have a clustered index on (ItemID, NetworkID) columns?

回答7:

Please try to change

FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

FROM dbo.News news
HASH JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

FROM dbo.News news
LOOP JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID

I don't really know what is in your data, but the joining of this tables may be a bottleneck.

来源：https://stackoverflow.com/questions/2086368/please-help-me-with-this-query-sql-server-2008

标签

sql-server

performance

indexing

query-optimization