SQL Server: Only last entry in GROUP BY

后端 未结 5 1076
日久生厌
日久生厌 2020-12-10 16:15

I have the following table in MSSQL2005

id | business_key | result
1 | 1 | 0
2 | 1 | 1
3 | 2 | 1
4 | 3 | 1
5 | 4 | 1
6 | 4 | 0

And now i wa

相关标签:
5条回答
  • 2020-12-10 16:36

    Try this

    select  business_key, 
            result
    from    myTable
    where   id in 
            (select max(id)
            from    myTable
            group by business_key)
    

    EDIT: I created the table to test my code. I include it below in case anybody else wants to test it.

    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[myTable](
        [id] [int] NOT NULL,
        [business_key] [int] NOT NULL,
        [result] [int] NOT NULL
    ) ON [PRIMARY]
    go
    
    insert into myTable values(1,1,0);
    insert into myTable values(2,1,1);
    insert into myTable values(3,2,1);
    insert into myTable values(4,3,1);
    insert into myTable values(5,4,1);
    insert into myTable values(6,4,0);
    
    select  * from mytable
    
    0 讨论(0)
  • 2020-12-10 16:40

    An alternative solution, which may give you better performance (test both ways and check the execution plans):

    SELECT
         T1.id,
         T1.business_key,
         T1.result
    FROM
         dbo.My_Table T1
    LEFT OUTER JOIN dbo.My_Table T2 ON
         T2.business_key = T1.business_key AND
         T2.id > T1.id
    WHERE
         T2.id IS NULL
    

    This query assumes that the ID is a unique value (at least for any given business_key) and that it is set to NOT NULL.

    0 讨论(0)
  • 2020-12-10 16:48

    This is an older post but was relevant to something I was doing currently (2013). If you get a larger dataset (typical in most DBs), the performance of the various queries (looking at execution plans) says a lot. First we create a "TALLY table" to randomly generate numbers, then use an arbitrary formula to create data for the "MyTable":

    CREATE TABLE #myTable(
        [id] [int] NOT NULL,
        [business_key] [int] NOT NULL,
        [result] [int] NOT NULL,
        PRIMARY KEY (Id)
    ) ON [PRIMARY];
    
    ; WITH
        -- Tally table Gen            Tally Rows:     X2                X3
    t1 AS (SELECT 1 N UNION ALL SELECT 1 N),    -- 4            ,    8
    t2 AS (SELECT 1 N FROM t1 x, t1 y),            -- 16            ,    64
    t3 AS (SELECT 1 N FROM t2 x, t2 y),            -- 256            ,    4096
    t4 AS (SELECT 1 N FROM t3 x, t3 y),            -- 65536        ,    16,777,216
    t5 AS (SELECT 1 N FROM t4 x, t4 y),            -- 4,294,967,296,    A lot
    Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
              FROM t5 x, t5 y)
    
    INSERT INTO #MyTable 
    SELECT N, CAST(N/RAND(N/8) AS bigINT)/5 , N%2
    FROM Tally
    WHERE N < 500000
    

    Next we run three different types of queries to review the performance (turn on "Actual Execution Plan" if you are using SQL Server Management Studio):

    SET STATISTICS IO ON
    SET STATISTICS TIME ON
    ----- Try #1 
    select  'T1' AS Qry, id, business_key, 
            result
    from    #myTable
    where   id in 
            (select max(id)
            from    #myTable
            group by business_key)
    
    ---- Try #2 
    select 'T2' AS Qry, id, business_key, 
           result
        from 
        (select id, 
            business_key, 
            result, 
            max(id) over (partition by business_key) as max_id
        from #mytable) x
    where id = max_id
    
    ---- Try #3 
    ;with cteRowNumber as (
        select id, 
            business_key, 
            result,
               row_number() over(partition by business_key order by id desc) as RowNum
            from #mytable
    )
    
    SELECT 'T3' AS Qry, id, business_key, 
           result
    FROM cteRowNumber
    WHERE RowNum = 1
    

    Cleanup:

    IF OBJECT_ID(N'TempDB..#myTable',N'U') IS NOT NULL 
        DROP TABLE #myTable;
        SET STATISTICS IO OFF
    SET STATISTICS TIME OFF
    

    You will find, looking at the execution plans, "Try 1" has the best "Query Cost" and lowest CPU time but "Try 3" has the least reads and CPU time is not too bad. I would recommend using a CTE method for the least reads

    0 讨论(0)
  • 2020-12-10 16:55
    select
      drv.business_key,
      mytable.result
    from mytable
      inner join
      (
        select 
          business_key, 
          max(id) as max_id
        from mytable
        group by
          business_key
      ) as drv on
        mytable.id = drv.max_id
    
    0 讨论(0)
  • 2020-12-10 17:03
    select business_key, 
           result
        from 
        (select id, 
            business_key, 
            result, 
            max(id) over (partition by business_key) as max_id
        from mytable) x
    where id = max_id
    
    0 讨论(0)
提交回复
热议问题