How to make row numbering with ordering, partitioning and grouping

六眼飞鱼酱① 提交于 2019-12-10 16:32:11

问题


I need to make row numbering with ordering, partitioning and grouping. Ordering by IdDocument, DateChange, partitioning by IdDocument and grouping by IdRole. The problem is in grouping especially. As it could be seen from the example (NumberingExpected) DENSE_RANK() must be the best function for this purpose but it makes repetition of numbering only when the values which are used to order are the same. In my case values used for ordering (IdDocument, DateChange) are always different and repetition of numbering must be done by IdRole.

Sure it could be solved by the usage of cursor very easy. But is there any way to make it with numbering/ranking functions?

Test data:

declare @LogTest as table (
    Id INT
    ,IdRole INT
    ,DateChange DATETIME
    ,IdDocument INT
    ,NumberingExpected INT
)
insert into @LogTest
select 1 as Id, 7 as IdRole, GETDATE() as DateChange, 13 as IdDocument, 1 as NumberingExpected
union 
select 2, 3, DATEADD(HH, 1, GETDATE()), 13, 2
union 
select 3, 3, DATEADD(HH, 2, GETDATE()), 13, 2
union 
select 4, 3, DATEADD(HH, 3, GETDATE()), 13, 2
union 
select 5, 5, DATEADD(HH, 4, GETDATE()), 13, 3
union 
select 7, 3, DATEADD(HH, 6, GETDATE()), 13, 4
union 
select 6, 3, DATEADD(HH, 5, GETDATE()), 27, 1
union 
select 8, 3, DATEADD(HH, 7, GETDATE()), 27, 1
union 
select 9, 5, DATEADD(HH, 8, GETDATE()), 27, 2
union 
select 10, 3, DATEADD(HH, 9, GETDATE()), 27, 3


select * from @LogTest order by IdDocument, DateChange;

Explanation in terms of functional programming:

  1. Order data by IdDocument, DateChange
  2. Set first row number as i=1 go to next row
  3. If IdDocument has changed { i=1; } else { If IdRow has changed { i++; } }
  4. set row number as i;
  5. go to the next row;
  6. IF EOF { exit; } else { go to step 3; }

回答1:


Since 2012 you could use LAG/LEAD, but in 2008 it is not available, so we'll emulate it. Performance could be poor, you should check with your actual data.

This is the final query:

WITH
CTE_rn
AS
(
    SELECT
        Main.IdRole
        ,Main.IdDocument
        ,Main.DateChange
        ,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
    FROM
        @LogTest AS Main
        OUTER APPLY
        (
            SELECT TOP (1) T.IdRole
            FROM @LogTest AS T
            WHERE
                T.IdDocument = Main.IdDocument
                AND T.DateChange < Main.DateChange
            ORDER BY T.DateChange DESC
        ) AS Prev
    WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
)
SELECT *
FROM
    @LogTest AS LT
    CROSS APPLY
    (
        SELECT TOP(1) CTE_rn.rn
        FROM CTE_rn
        WHERE
            CTE_rn.IdDocument = LT.IdDocument
            AND CTE_rn.IdRole = LT.IdRole
            AND CTE_rn.DateChange <= LT.DateChange
        ORDER BY CTE_rn.DateChange DESC
    ) CA_rn
ORDER BY IdDocument, DateChange;

Final Result set:

Id    IdRole    DateChange                 IdDocument    NumberingExpected    rn
1     7         2015-01-26 20:00:41.210    13            1                    1
2     3         2015-01-26 21:00:41.210    13            2                    2
3     3         2015-01-26 22:00:41.210    13            2                    2
4     3         2015-01-26 23:00:41.210    13            2                    2
5     5         2015-01-27 00:00:41.210    13            3                    3
7     3         2015-01-27 02:00:41.210    13            4                    4
6     3         2015-01-27 01:00:41.210    27            1                    1
8     3         2015-01-27 03:00:41.210    27            1                    1
9     5         2015-01-27 04:00:41.210    27            2                    2
10    3         2015-01-27 05:00:41.210    27            3                    3

How it works

1) We need the value of IdRole from the previous row when the table is ordered by IdDocument and DateChange. To get it we use OUTER APPLY (because LAG is not available):

SELECT *
FROM
    @LogTest AS Main
    OUTER APPLY
    (
        SELECT TOP (1) T.IdRole
        FROM @LogTest AS T
        WHERE
            T.IdDocument = Main.IdDocument
            AND T.DateChange < Main.DateChange
        ORDER BY T.DateChange DESC
    ) AS Prev
ORDER BY Main.IdDocument, Main.DateChange;

This is result set of this first step:

Id    IdRole    DateChange                 IdDocument    NumberingExpected    IdRole
1     7         2015-01-26 20:50:32.560    13            1                    NULL
2     3         2015-01-26 21:50:32.560    13            2                    7
3     3         2015-01-26 22:50:32.560    13            2                    3
4     3         2015-01-26 23:50:32.560    13            2                    3
5     5         2015-01-27 00:50:32.560    13            3                    3
7     3         2015-01-27 02:50:32.560    13            4                    5
6     3         2015-01-27 01:50:32.560    27            1                    NULL
8     3         2015-01-27 03:50:32.560    27            1                    3
9     5         2015-01-27 04:50:32.560    27            2                    3
10    3         2015-01-27 05:50:32.560    27            3                    5

2) We want to remove rows with repeating IdRole, so we add a WHERE and number the rows. You can see that row numbers follow the expected result:

SELECT
    Main.IdRole
    ,Main.IdDocument
    ,Main.DateChange
    ,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
FROM
    @LogTest AS Main
    OUTER APPLY
    (
        SELECT TOP (1) T.IdRole
        FROM @LogTest AS T
        WHERE
            T.IdDocument = Main.IdDocument
            AND T.DateChange < Main.DateChange
        ORDER BY T.DateChange DESC
    ) AS Prev
WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
;

This is result set of this step (it becomes the CTE):

IdRole    IdDocument    DateChange                 rn
7         13            2015-01-26 20:13:26.247    1
3         13            2015-01-26 21:13:26.247    2
5         13            2015-01-27 00:13:26.247    3
3         13            2015-01-27 02:13:26.247    4
3         27            2015-01-27 01:13:26.247    1
5         27            2015-01-27 04:13:26.247    2
3         27            2015-01-27 05:13:26.247    3

3) Finally, we need to get the correct row number from CTE for each row of the original table. I use CROSS APPLY to get one row from CTE for each row of the original table.




回答2:


This might not be pretty but it does create the required output.

; with cte as (
    select l.Id,l.IdRole,l.IdDocument,l.NumberingExpected,l.DateChange,
    (select min(x.DateChange) from @LogTest x where x.IdDocument = l.IdDocument and x.IdRole = l.IdRole and x.id<=l.id and 
        x.id > (select max(y.id) from @LogTest y where y.IdDocument = l.IdDocument and y.IdRole <> l.IdRole and y.id <=l.Id)) as DateChange2
    from @LogTest l
)
select c.Id,c.IdRole,c.DateChange,c.IdDocument,c.NumberingExpected,dense_rank() over (partition by c.IdDocument order by c.DateChange2) as rn
from cte c order by c.IdDocument, c.DateChange;

If I had some more time I think the x.id predicate in the CTE could be improved.




回答3:


WITH RankByIdDocumentAndDataChanged AS
(
    SELECT *, 
        CASE 
             IdRole - LAG(IdRole) OVER (PARTITION BY IdDocument ORDER BY DateChange) 
             WHEN 0 THEN 0 
             ELSE 1 
        END AS DIFF
    FROM @LogTest
)
select *, SUM(DIFF) OVER (PARTITION BY IdDocument ORDER BY DateChange)
from RankByIdDocumentAndDataChanged 
ORDER BY Id


来源:https://stackoverflow.com/questions/28146158/how-to-make-row-numbering-with-ordering-partitioning-and-grouping

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!