Keyset Pagination - Filter By Search Term across Multiple Columns

三世轮回 提交于 2020-08-27 22:06:37

问题


I'm trying to move away from OFFSET/FETCH pagination to Keyset Pagination (also known as Seek Method). Since I'm just started, there are many questions I have in my mind but this is one of many where I try to get the pagination right along with Filter.

So I have 2 tables

  1. aspnet_users

having columns

PK

UserId uniquidentifier

Fields

UserName NVARCHAR(256) NOT NULL, 
AffiliateTag varchar(50) NULL
.....other fields
  1. aspnet_membership

having columns

PK+FK

UserId uniquidentifier

Fields

Email NVARCHAR(256) NOT NULL
.....other fields

Indexes

  1. Non Clustered Index on Table aspnet_users (UserName)
  2. Non Clustered Index on Table aspnet_users (AffiliateTag)
  3. Non Clustered Index on Table aspnet_membership(Email)

I have a page that will list the users (based on search term) with page size set to 20. And I want to search across multiple columns so instead of doing OR I find out having a separate query for each and then Union them will make the index use correctly.

so have the stored proc that will take search term and optionally UserName and UserId of last record for next page.

Create proc [dbo].[sp_searchuser]
@take int,
@searchTerm nvarchar(max) NULL,
@lastUserName nvarchar(256)=NULL,
@lastUserId nvarchar(256)=NULL
AS

IF(@lastUserName IS NOT NULL AND @lastUserId IS NOT NULL)
Begin
    select top (@take) *
    from
    (
        select  u.UserId, u.UserName, u.AffiliateTag, m.Email
        from aspnet_Users as u
        inner join aspnet_Membership as m
        on u.UserId=m.UserId
        where u.UserName like @searchTerm

        UNION

        select  u.UserId, u.UserName, u.AffiliateTag, m.Email
        from aspnet_Users as u
        inner join aspnet_Membership as m
        on u.UserId=m.UserId
        where u.AffiliateTag like convert(varchar(50), @searchTerm)
    ) as u1
    where u1.UserName > @lastUserName
        OR (u1.UserName=@lastUserName And u1.UserId > convert(uniqueidentifier, @lastUserId))
    order by u1.UserName
End

Else
Begin

    select top (@take) *
    from
    (
        select  u.UserId, u.UserName, u.AffiliateTag, m.Email
        from aspnet_Users as u
        inner join aspnet_Membership as m
        on u.UserId=m.UserId
        where u.UserName like @searchTerm

        UNION

        select  u.UserId, u.UserName, u.AffiliateTag, m.Email
        from aspnet_Users as u
        inner join aspnet_Membership as m
        on u.UserId=m.UserId
        where u.AffiliateTag like convert(varchar(50), @searchTerm)
    ) as u1
    
    order by u1.UserName
End

Now to get the result for first page with search term mua

exec [sp_searchuser] 20, 'mua%'

it uses both indexes created one for UserName column and another for AffiliateTag column which is good

But the problem is I find the inner union queries return all the matching rows

like in this case, the execution plan shows

UserName Like SubQuery

Number of Rows Read= 5
Actual Number of Rows= 4

AffiliateTag Like SubQuery

Number of Rows Read= 465
Actual Number of Rows= 465

so in total inner queries return 469 matching rows

and then outer query take out 20 for final result reset. So really reading more data than needed.

And when go to next page

exec [sp_searchuser] 20, 'mua%', 'lastUserName', 'lastUserId'

the execution plan shows

UserName Like SubQuery

Number of Rows Read= 5
Actual Number of Rows= 4

AffiliateTag Like SubQuery

Number of Rows Read= 465
Actual Number of Rows= 445

in total inner queries return 449 matching rows

so either with or without pagination, it reads more data than needed.

My expectation is to somehow limit the inner queries so it does not return all matching rows.


回答1:


You might be interested in the Logical Processing Order, which determines when the objects defined in one step are made available to the clauses in subsequent steps. The Logical Processing Order steps are:

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

Of course, as noted the docs:

The actual physical execution of the statement is determined by the query processor and the order may vary from this list.

meaning that sometimes some statements can start before previous complete.

In your case, you query looks like:

  1. some data extraction
  2. sort by user_name
  3. get TOP records

There is no way to reduce the rows in the data extraction part as to have a deterministic result (we actually may need to order by user_name, user_id to have such) we need to get all matching rows, sort them and then get the desired rows.

For example, image the first query returning 20 names starting with 'Z'. And the second query to returned only one name starting with 'A'. If you stop somehow the execution and skip the second query, you will get wrong results - 20 names starting with 'Z' instead one starting with 'A' and 19 with 'Z'.

In such cases, I prefer to use dynamic T-SQL statements in order to get better execution times and reduce the code length. You are saying:

And I want to search across multiple columns so instead of doing OR I find out having a separate query for each and then Union them will make the index use correctly.

When you are using UNION you are performing double reads to your tables. In your cases, you are reading the aspnet_Membership table twice and the aspnet_Users twice (yes, here you are using two different indexes but I believe they are not covering and you end up performing look ups to extract the users name and email.

I guess you have started with covering indexed like in the example below:

DROP TABLE IF EXISTS [dbo].[StackOverflow];

CREATE TABLE [dbo].[StackOverflow]
(
    [UserID] INT PRIMARY KEY
   ,[UserName] NVARCHAR(128)
   ,[AffiliateTag] NVARCHAR(128)
   ,[UserEmail] NVARCHAR(128)
   ,[a] INT
   ,[b] INT
   ,[c] INT
   ,[z] INT
);

CREATE INDEX IX_StackOverflow_UserID_UserName_AffiliateTag_I_UserEmail ON [dbo].[StackOverflow]
(
    [UserID]
   ,[UserName]
   ,[AffiliateTag]
)
INCLUDE ([UserEmail]);

GO

INSERT INTO [dbo].[StackOverflow] ([UserID], [UserName], [AffiliateTag], [UserEmail])
SELECT TOP (1000000) ROW_NUMBER() OVER(ORDER BY t1.number)
                    ,CONCAT('UserName',ROW_NUMBER() OVER(ORDER BY t1.number))
                    ,CONCAT('AffiliateTag', ROW_NUMBER() OVER(ORDER BY t1.number))
                    ,CONCAT('UserEmail', ROW_NUMBER() OVER(ORDER BY t1.number))
FROM master..spt_values t1 
CROSS JOIN master..spt_values t2;


GO

So, for the following query:

SELECT TOP 20 [UserID]
             ,[UserName]
             ,[AffiliateTag]
             ,[UserEmail]
FROM [dbo].[StackOverflow]
WHERE [UserName] LIKE 'UserName200%'
    OR [AffiliateTag] LIKE 'UserName200%'
ORDER BY [UserName];


GO

The issue here is we are reading all the rows even we are using the index.

What's good is that the index is covering and we are not performing look ups. Depending on the search criteria it may perform better than your approach.

If the performance is bad, we can use a trigger to UNPIVOT the original data and record in a separate table. It may look like this (it will be better to use attribute_id rather than the text like me):

DROP TABLE IF EXISTS [dbo].[StackOverflowAttributes];

CREATE TABLE [dbo].[StackOverflowAttributes]
(
    [UserID] INT
   ,[AttributeName] NVARCHAR(128)
   ,[AttributeValue] NVARCHAR(128)
   ,PRIMARY KEY([UserID], [AttributeName], [AttributeValue])
);

GO

CREATE INDEX IX_StackOverflowAttributes_AttributeValue ON [dbo].[StackOverflowAttributes]
(
    [AttributeValue]
)

INSERT INTO [dbo].[StackOverflowAttributes] ([UserID], [AttributeName], [AttributeValue])
SELECT [UserID]
      ,'Name'
      ,[UserName]
FROM [dbo].[StackOverflow]
UNION 
SELECT [UserID]
      ,'AffiliateTag'
      ,[AffiliateTag]
FROM [dbo].[StackOverflow];

and the query before will looks like:

SELECT TOP 20 U.[UserID]
             ,U.[UserName]
             ,U.[AffiliateTag]
             ,U.[UserEmail]
FROM [dbo].[StackOverflowAttributes] A
INNER JOIN [dbo].[StackOverflow] U
    ON A.[UserID] = U.[UserID]
WHERE A.[AttributeValue] LIKE 'UserName200%'
ORDER BY U.[UserName];

Now, we are reading only a part of the the index rows and after that performing a lookup.

In order to compare performance it will be better to use:

SET STATISTICS IO, TIME ON; 

as it will give you how pages are read from the indexes. The result can be visualized here.



来源:https://stackoverflow.com/questions/63316273/keyset-pagination-filter-by-search-term-across-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!