Removing duplicates from a SQL query (not just “use distinct”)

╄→尐↘猪︶ㄣ 提交于 2019-11-29 06:01:00

Arbitrarily choosing to keep the minimum PIC_ID. Also, avoid using the implicit join syntax.

SELECT U.NAME, MIN(P.PIC_ID)
    FROM USERS U
        INNER JOIN POSTINGS P1
            ON U.EMAIL_ID = P1.EMAIL_ID
        INNER JOIN PICTURES P
            ON P1.PIC_ID = P.PIC_ID
    WHERE P.CAPTION LIKE '%car%'
    GROUP BY U.NAME;

Your question is kind of confusing; do you want to show only one row per user, or do you want to show a row per picture but suppress repeating values in the U.NAME field? I think you want the second; if not there are plenty of answers for the first.

Whether to display repeating values is display logic, which SQL wasn't really designed for. You can use a cursor in a loop to process the results row-by-row, but you will lose a lot of performance. If you have a "smart" frontend language like a .NET language or Java, whatever construction you put this data into can be cheaply manipulated to suppress repeating values before finally displaying it in the UI.

If you're using Microsoft SQL Server, and the transformation HAS to be done at the data layer, you may consider using a CTE (Computed Table Expression) to hold the initial query, then select values from each row of the CTE based on whether the columns in the previous row hold the same data. It'll be more performant than the cursor, but it'll be kinda messy either way. Observe:

USING CTE (Row, Name, PicID)
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY U.NAME, P.PIC_ID),
       U.NAME, P.PIC_ID
    FROM USERS U
        INNER JOIN POSTINGS P1
            ON U.EMAIL_ID = P1.EMAIL_ID
        INNER JOIN PICTURES P
            ON P1.PIC_ID = P.PIC_ID
    WHERE P.CAPTION LIKE '%car%'
    ORDER BY U.NAME, P.PIC_ID 
)
SELECT
    CASE WHEN current.Name == previous.Name THEN '' ELSE current.Name END,
    current.PicID
FROM CTE current
LEFT OUTER JOIN CTE previous
   ON current.Row = previous.Row + 1
ORDER BY current.Row

The above sample is TSQL-specific; it is not guaranteed to work in any other DBPL like PL/SQL, but I think most of the enterprise-level SQL engines have something similar.

You need to tell the query what value to pick for the other columns, MIN or MAX seem like suitable choices.

 SELECT
   U.NAME, MIN(P.PIC_ID)
 FROM
   USERS U,
   PICTURES P,
   POSTINGS P1
 WHERE
   U.EMAIL_ID = P1.EMAIL_ID AND
   P1.PIC_ID = P.PIC_ID AND
   P.CAPTION LIKE '%car%'
 GROUP BY
   U.NAME;

If I understand you correctly, you want to list to exclude duplicates on one column only, inner join to a sub-select

select u.* [whatever joined values]
from users u
inner join
(select name from users group by name having count(*)=1) uniquenames
on uniquenames.name = u.name

If I understand you correctly, you want a list of all pictures with the same name (and their different ids) such that their name occurs more than once in the table. I think this will do the trick:

SELECT U.NAME, P.PIC_ID
FROM USERS U, PICTURES P, POSTINGS P1
WHERE U.EMAIL_ID = P1.EMAIL_ID AND P1.PIC_ID = P.PIC_ID AND U.Name IN (
SELECT U.Name 
FROM USERS U, PICTURES P, POSTINGS P1
WHERE U.EMAIL_ID = P1.EMAIL_ID AND P1.PIC_ID = P.PIC_ID AND P.CAPTION LIKE '%car%';
GROUP BY U.Name HAVING COUNT(U.Name) > 1)

I haven't executed it, so there may be a syntax error or two there.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!