Getting duplicates with additional information

匿名 (未验证) 提交于 2019-12-03 02:41:02

问题:

I've inherited a database and I'm having trouble constructing a working SQL query.

Suppose this is the data:

[Products]  | Id    | DisplayId     | Version   | Company   | Description   | |----   |-----------    |---------- |-----------| -----------   | | 1     | 12345         | 0         | 16        | Random        | | 2     | 12345         | 0         | 2         | Random 2      | | 3     | AB123         | 0         | 1         | Random 3      | | 4     | 12345         | 1         | 16        | Random 4      | | 5     | 12345         | 1         | 2         | Random 5      | | 6     | AB123         | 0         | 5         | Random 6      | | 7     | 12345         | 2         | 16        | Random 7      | | 8     | XX45          | 0         | 5         | Random 8      | | 9     | XX45          | 0         | 7         | Random 9      | | 10    | XX45          | 1         | 5         | Random 10     | | 11    | XX45          | 1         | 7         | Random 11     |   [Companies]  | Id    | Code      | |----   |-----------| | 1     | 'ABC'     | | 2     | '456'     | | 5     | 'XYZ'     | | 7     | 'XYZ'     | | 16    | '456'     | 

The Versioncolumn is a version number. Higher numbers indicate more recent versions. The Company column is a foreign key referencing the Companies table on the Id column. There's another table called ProductData with a ProductId column referencing Products.Id.

Now I need to find duplicates based on the DisplayId and the corresponding Companies.Code. The ProductData table should be joined to show a title (ProductData.Title), and only the most recent ones should be included in the results. So the expected results are:

| Id    | DisplayId     | Version   | Company   | Description   | ProductData.Title | |----   |-----------    |---------- |-----------|-------------  |------------------ | | 5     | 12345         | 1         | 2         | Random 2      | Title 2           | | 7     | 12345         | 2         | 16        | Random 7      | Title 7           | | 10    | XX45          | 1         | 5         | Random 10     | Title 10          | | 11    | XX45          | 1         | 7         | Random 11     | Title 11          | 
  • because XX45 has 2 "entries": one with Company 5 and one with Company 7, but both companies share the same code.
  • because 12345 has 2 "entries": one with Company 2 and one with Company 16, but both companies share the same code. Note that the most recent version of both differs (version 2 for company 16's entry and version 1 for company 2's entry)
  • ABC123 should not be included as its 2 entries have different company codes.

I'm eager to learn your insights...

回答1:

Try this:

SELECT b.ID,displayid,version,company,productdata.title FROM  (select A.ID,a.displayid,version,a.company,rn,a.code, COUNT(displayid)  over (partition by displayid,code) cnt from (select Prod.ID,displayid,version,company,Companies.code, Row_number() over (partition by displayid,company order by version desc) rn from Prod inner join Companies on Prod.Company = Companies.id) a   where a.rn=1) b inner join productdata on b.id = productdata.id  where cnt =2 


回答2:

Based on your sample data, you just need to JOIN the tables:

  SELECT      p.Id, p.DisplayId, p.Version, p.Company, d.Title   FROM Products AS p   INNER JOIN Companies AS c ON p.Company = c.Id   INNER JOIN ProductData AS d ON d.ProductId = p.Id; 

But if you want the latest one, you can use the ROW_NUMBER():

WITH CTE AS (   SELECT      p.Id, p.DisplayId, p.Version, p.Company, d.Title,     ROW_NUMBER() OVER(PARTITION BY p.DisplayId,p.Company ORDER BY p.Id DESC) AS RN   FROM Products AS p   INNER JOIN Companies AS c ON p.Company = c.Id   INNER JOIN ProductData AS d ON d.ProductId = p.Id ) SELECT *  FROM CTE WHERE RN = 1; 

sample fiddle

| Id | DisplayId | Version | Company |    Title | |----|-----------|---------|---------|----------| |  5 |     12345 |       1 |       2 |  Title 5 | |  7 |     12345 |       2 |      16 |  Title 7 | | 10 |      XX45 |       1 |       5 | Title 10 | | 11 |      XX45 |       1 |       7 | Title 11 | 


回答3:

If i understood you correctly, you can use CTE to find all the duplicated rows from your table, then you can just use SELECT from CTE and even add more manipulations.

WITH CTE AS(    SELECT Id,DisplayId,Version,Company,Description,ProductData.Title        RN = ROW_NUMBER()OVER(PARTITION BY DisplayId, Company ORDER BY p.Id DESC)    FROM dbo.YourTable1 )  SELECT * FROM CTE 


回答4:

You have to first get the current version and then you see how many times the DisplayID + Code show-up. Then based on that you can select only the ones that have a count greater than one. You can then INNER JOIN ProductData on the final query to get the Title.

WITH MaxVersion AS --Get the current versions (     SELECT         MAX(Version) AS Version,         DisplayID,         Company     FROM         #TmpProducts     GROUP BY         DisplayID,         Company ) ,CTE AS (     SELECT         p.DisplayID,         c.Code,         COUNT(*) AS RowCounter     FROM         #TmpProducts p     INNER JOIN         #TmpCompanies c         ON             c.ID = p.Company     INNER JOIN         MaxVersion mv         ON             mv.DisplayID = p.DisplayID         AND mv.Version = p.Version         AND mv.Company = p.Company     GROUP BY         p.DisplayID,         c.Code )  SELECT      p.* FROM     #TmpProducts p INNER JOIN     CTE c     ON         c.DisplayID = p.DisplayID INNER JOIN     MaxVersion mv     ON         mv.DisplayID = p.DisplayID     AND mv.Company = p.Company     AND mv.Version = p.Version WHERE     c.RowCounter > 1 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!