SQL Left Join first match only

▼魔方 西西 提交于 2019-11-30 01:19:30

distinct is not a function. It always operates on all columns of the select list.

Your problem is a typical "greatest N per group" problem which can easily be solved using a window function:

select ...
from (
  select IDNo,
         FirstName,
         LastName,
         ....,
         row_number() over (partition by lower(idno) order by firstname) as rn 
  from people 
) t
where rn = 1;

Using the order by clause you can select which of the duplicates you want to pick.

The above can be used in a left join:

select ...
from x
  left join (
    select IDNo,
           FirstName,
           LastName,
           ....,
           row_number() over (partition by lower(idno) order by firstname) as rn 
    from people 
  ) p on p.idno = x=idno and p.rn = 1
where ...
T8RB

Add an identity column (PeopleID) and then use a correlated subquery to return the first value for each value.

SELECT *
FROM People p
WHERE PeopleID = (
    SELECT MIN(PeopleID) 
    FROM People 
    WHERE IDNo = p.IDNo
)

Turns out I was doing it wrong, I needed to perform a nested select first of just the important columns, and do a distinct select off that to prevent trash columns of 'unique' data from corrupting my good data. The following appears to have resolved the issue... but I will try on the full dataset later.

SELECT DISTINCT P2.*
FROM (
  SELECT
      IDNo
    , FirstName
    , LastName
  FROM people P
) P2

Here is some play data as requested: http://sqlfiddle.com/#!3/050e0d/3

CREATE TABLE people
(
       [entry] int
     , [IDNo] varchar(3)
     , [FirstName] varchar(5)
     , [LastName] varchar(7)
);

INSERT INTO people
    (entry,[IDNo], [FirstName], [LastName])
VALUES
    (1,'uqx', 'bob', 'smith'),
    (2,'abc', 'john', 'willis'),
    (3,'ABC', 'john', 'willis'),
    (4,'aBc', 'john', 'willis'),
    (5,'WTF', 'jeff', 'bridges'),
    (6,'Sss', 'bill', 'doe'),
    (7,'sSs', 'bill', 'doe'),
    (8,'ssS', 'bill', 'doe'),
    (9,'ere', 'sally', 'abby'),
    (10,'wtf', 'jeff', 'bridges')
;

Depending on the nature of the duplicate rows, it looks like all you want is to have case-sensitivity on those columns. Setting the collation on these columns should be what you're after:

SELECT DISTINCT p.IDNO COLLATE SQL_Latin1_General_CP1_CI_AS, p.FirstName COLLATE SQL_Latin1_General_CP1_CI_AS, p.LastName COLLATE SQL_Latin1_General_CP1_CI_AS
FROM people P

http://msdn.microsoft.com/en-us/library/ms184391.aspx

Try this

 SELECT *
 FROM people P 
 where P.IDNo in (SELECT DISTINCT IDNo
              FROM people)

After careful consideration this dillema has a few different solutions:

Aggregate Everything Use an aggregate on each column to get the biggest or smallest field value. This is what I am doing since it takes 2 partially filled out records and "merges" the data.

http://sqlfiddle.com/#!3/59cde/1

SELECT
  UPPER(IDNo) AS user_id
, MAX(FirstName) AS name_first
, MAX(LastName) AS name_last
, MAX(entry) AS row_num
FROM people P
GROUP BY 
  IDNo

Get First (or Last record)

http://sqlfiddle.com/#!3/59cde/23

-- ------------------------------------------------------
-- Notes
-- entry: Auto-Number primary key some sort of unique PK is required for this method
-- IDNo:  Should be primary key in feed, but is not, we are making an upper case version
-- This gets the first entry to get last entry, change MIN() to MAX()
-- ------------------------------------------------------

SELECT 
   PC.user_id
  ,PData.FirstName
  ,PData.LastName
  ,PData.entry
FROM (
  SELECT 
      P2.user_id
     ,MIN(P2.entry) AS rownum
  FROM (
    SELECT
        UPPER(P.IDNo) AS user_id 
      , P.entry 
    FROM people P
  ) AS P2
  GROUP BY 
    P2.user_id
) AS PC
LEFT JOIN people PData
ON PData.entry = PC.rownum
ORDER BY 
   PData.entry
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!