问题
So I have a simple table that holds comments from a user that pertain to a specific blog post.
id | user | post_id | comment
----------------------------------------------------------
0 | john@test.com | 1001 | great article
1 | bob@test.com | 1001 | nice post
2 | john@test.com | 1002 | I agree
3 | john@test.com | 1001 | thats cool
4 | bob@test.com | 1002 | thanks for sharing
5 | bob@test.com | 1002 | really helpful
6 | steve@test.com | 1001 | spam post about pills
I want to get all instances where a user commented on the same post twice (meaning same user and same post_id). In this case I would return:
id | user | post_id | comment
----------------------------------------------------------
0 | john@test.com | 1001 | great article
3 | john@test.com | 1001 | thats cool
4 | bob@test.com | 1002 | thanks for sharing
5 | bob@test.com | 1002 | really helpful
I thought DISTINCT was what I needed but that just gives me unique rows.
回答1:
You can use GROUP BY and HAVING to find pairs of user and post_id that have multiple entries:
SELECT a.*
FROM table_name a
JOIN (SELECT user, post_id
FROM table_name
GROUP BY user, post_id
HAVING COUNT(id) > 1
) b
ON a.user = b.user
AND a.post_id = b.post_id
回答2:
DISTINCT removes all duplicate rows, which is why you're getting unique rows.
You can try using a CROSS JOIN (available as of Hive 0.10 according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins):
SELECT mt.*
FROM MYTABLE mt
CROSS JOIN MYTABLE mt2
WHERE mt.user = mt2.user
AND mt.post_id = mt2.post_id
The performance might not be the best though. If you wanted to sort it, use SORT BY or ORDER BY.
回答3:
DECLARE @MyTable TABLE (id int, usr varchar(50), post_id int, comment varchar(50))
INSERT @MyTable (id, usr, post_id, comment) VALUES (0,'john@test.com',1001,'great article')
INSERT @MyTable (id, usr, post_id, comment) VALUES (1,'bob@test.com',1001,'nice post')
INSERT @MyTable (id, usr, post_id, comment) VALUES (3,'john@test.com',1002,'I agree')
INSERT @MyTable (id, usr, post_id, comment) VALUES (4,'john@test.com',1001,'thats cool')
INSERT @MyTable (id, usr, post_id, comment) VALUES (5,'bob@test.com',1002,'thanks for sharing')
INSERT @MyTable (id, usr, post_id, comment) VALUES (6,'bob@test.com',1002,'really helpful')
INSERT @MyTable (id, usr, post_id, comment) VALUES (7,'steve@test.com',1001,'spam post about pills')
SELECT
T1.id,
T1.usr,
T1.post_id,
T1.comment
FROM
@MyTable T1
INNER JOIN @MyTable T2
ON T1.usr = T2.usr AND T1.post_id = T2.post_id
GROUP BY
T1.id,
T1.usr,
T1.post_id,
T1.comment
HAVING
Count(T2.id) > 1
来源:https://stackoverflow.com/questions/27726186/sql-find-all-instances-where-two-columns-are-the-same