Optimal performing query for latest record for each N

前端 未结 3 1433
慢半拍i
慢半拍i 2020-12-31 05:23

Here is the scenario I find myself in.

I have a reasonably big table that I need to query the latest records from. Here is the create for the essential columns for t

相关标签:
3条回答
  • 2020-12-31 05:27

    Depends on your data (how many rows are there per group?) and your indexes.

    See Optimizing TOP N Per Group Queries for some performance comparisons of 3 approaches.

    In your case with millions of rows for only a small number of Vehicles I would add an index on VehicleID, Timestamp and do

    SELECT CA.*
    FROM   Vehicles V
           CROSS APPLY (SELECT TOP 1 *
                        FROM   ChannelValue CV
                        WHERE  CV.VehicleID = V.VehicleID
                        ORDER  BY TimeStamp DESC) CA  
    
    0 讨论(0)
  • 2020-12-31 05:32

    If your records are inserted sequentially, replacing TimeStamp in your query with ID may make a difference.

    As a side note, how many records is this returning? Your delay could be network overhead if you are getting hundreds of thousands of rows back.

    0 讨论(0)
  • 2020-12-31 05:32

    Try this:

    SELECT SequencedChannelValue.* -- Specify only the columns you need, exclude the SequencedChannelValue
    FROM
        (
            SELECT 
                ChannelValue.*,   -- Specify only the columns you need
                SeqValue = ROW_NUMBER() OVER(PARTITION BY VehicleID ORDER BY TimeStamp DESC)
            FROM ChannelValue
        ) AS SequencedChannelValue
    WHERE SequencedChannelValue.SeqValue = 1
    

    A table or index scan is expected, because you're not filtering data in any way. You're asking for the latest TimeStamp for all VehicleIDs - the query engine HAS to look at every row to find the latest TimeStamp.

    You can help it out by narrowing the number of columns being returned (don't use SELECT *), and by providing an index that consists of VehicleID + TimeStamp.

    0 讨论(0)
提交回复
热议问题