How to get the last records for a combination of 2 columns?

只愿长相守 提交于 2020-03-02 08:03:19

问题


I have a situation which I think can be compared to services like CamelCamelCamel, Keepa and so on. Lets say I track the price of an article on each day for a couple of countries. So my table, lets call it Trend, would look something like this

Id     Created      ArticleId    Country    Price
-------------------------------------------------
01     19/11/05     452          US         45.90
02     19/11/05     452          CA         52.99
03     19/11/05     452          MX         99.99
04     19/11/06     452          US         20.00
05     19/11/06     452          CA         25.00
06     19/11/06     452          MX         50.00
...                
97     19/11/05     738          US         12.99
98     19/11/05     738          CA         17.50
99     19/11/05     738          MX         45.50

So it's the next day and I want to update the Trend table. If the price in a country is still the same, I skip the article/country combination. If there is a new price I'll add a new record.

Now I want to query the table to get each ArticleId / Country combination. But only the last record of it (orderd by timestamp). So taken the example above I'd expect to get the records 04, 05 and 06 for ArticleId 452. Not 01, 02 and 03

So I start out with this basic query. But how do I get to change it to get my expected results?

SELECT
    *
FROM
    Trend
ORDER BY 
    Created DESC

回答1:


One method uses a correlated subquery for filtering:

select t.*
from trend t
where t.created = (
    select max(t1.created) 
    from trend t1
    where t1.articleId = t.articleId and t1.country = t.country
)

For performance, you want an index on (articleId, country, created).

You might also want to consider the anti-left join approach:

select t.*
from trend t
left join trend t1 
    on  t1.articleId = t.articleId 
    and t1.country = t.country
    and t1.created > t.created
where t1.articleId is null

Finally, another typical solution is to join the table with an aggregate query:

select t.*
from trend t
inner join (
    select articleId, country, max(created) created
    from trend
    group by articleId, country
) t1 
    on  t1.articleId = t.articleId 
    and t1.country = t.country
    and t1.created = t.created

Which solution performs better depends on the size and distribution of your data.




回答2:


You can do this with a combination of DISTINCT and CROSS APPLY.

SELECT DISTINCT ca.Id, ca.Created, t.ArticleId, t.Country, ca.Price
FROM Trend t
CROSS APPLY (SELECT TOP 1 Id, Created, Price
             FROM Trend
             WHERE ArticleId = t.ArticleId AND Country = t.Country
             ORDER BY Created DESC) ca

Often times when writing queries using APPLY your joining columns (ArticleId and Country) make up a unique key on another table. If this applies to your database you can drop the DISTINCT and speed up the query.

SELECT ca.Id, ca.Created, a.ArticleId, a.Country, ca.Price
FROM Article a
CROSS APPLY (SELECT TOP 1 Id, Created, Price
             FROM Trend
             WHERE ArticleId = a.ArticleId AND Country = a.Country
             ORDER BY Created DESC) ca

Lastly, if you're having issues with performance, you may want to create an index.

CREATE NONCLUSTERED INDEX [NC_Trend_ArticleId] ON [Trend]
(
    [ArticleId] ASC,
    [Country] ASC,
    [Created] ASC
)
INCLUDE ([Price])

Presumably Id is a PRIMARY KEY and is already covered by a CLUSTERED INDEX, if so, the above should be appropriate for most solutions.



来源:https://stackoverflow.com/questions/60046409/how-to-get-the-last-records-for-a-combination-of-2-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!