问题
I have table that has a column that may have same values in a burst. Like this:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 2 | 6050000 |
+----+---------+
| 3 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 5 | 6060000 |
+----+---------+
| 6 | 6060000 |
+----+---------+
| 7 | 6060000 |
+----+---------+
| 8 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+
Now I want to prune rows where the value of Col1
is repeated and only select the first occurrence.
For the above table the result should be:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
How can I do this in SQL?
Note that only burst rows should be removed and values can be repeated in non-burst rows! id=1
& id=9
are repeated in sample result.
EDIT:
I achieved it using this:
select id,col1 from data as d1
where not exists (
Select id from data as d2
where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)
But this only works when ids are sequential. With gaps between ids (deleted ones) the query breaks. How can I fix this?
回答1:
You can use a EXISTS
semi-join to identify candidates:
Select wanted rows:
SELECT * FROM tbl
WHERE NOT EXISTS (
SELECT *
FROM tbl t
WHERE t.col1 = tbl.col1
AND t.id = tbl.id - 1
)
ORDER BY id
Get rid of unwanted rows:
DELETE FROM tbl
-- SELECT * FROM tbl
WHERE EXISTS (
SELECT *
FROM tbl t
WHERE t.col1 = tbl.col1
AND t.id = tbl.id - 1
)
This effectively deletes every row, where the preceding row has the same value in col1
, thereby arriving at your set goal: only the first row of every burst survives.
I left the commented SELECT
statement because you should always check what is going to be deleted before you do the deed.
Solution for non-sequential IDs:
If your RDBMS supports the CTE and window functions (like PostgreSQL, Oracle, SQL Server, ... but not SQLite, MS Access or MySQL), there is an elegant way:
WITH x AS (
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM tbl
)
SELECT id, col1
FROM x
WHERE NOT EXISTS (
SELECT *
FROM x x1
WHERE x1.col1 = x.col1
AND x1.rn = x.rn - 1
)
ORDER BY id;
There is also the not-so-elegant way that does the job without those niceties.
Should work for you:
SELECT id, col1
FROM tbl
WHERE (
SELECT t.col1 = tbl.col1
FROM tbl AS t
WHERE t.id < tbl.id
ORDER BY id DESC
LIMIT 1) IS NOT TRUE
ORDER BY id
Tool for test-casing non-sequential IDs
(Tested in PostgreSQL)
CREATE TEMP TABLE tbl (id int, col1 int);
INSERT INTO tbl VALUES
(1,6050000),(2,6050000),(6,6050000)
,(14,6060000),(15,6060000),(16,6060000)
,(17,6060000),(18,6060000),(19,6050000)
,(20,6000000),(111,6000000);
回答2:
select min(id), Col1 from tableName group by Col1
回答3:
If your RDBMS supports Window Aggregate functions and/or LEAD() and LAG() functions you can leverage them to accomplish what you are trying to report. The following SQL will help get you started down the right path:
SELECT id
, Col AS CurCol
, MAX(Col)
OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS PrevCol
, MIN(COL)
OVER(ORDER BY id ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS NextCol
FROM MyTable
From there you can put that SQL in a derived table with some CASE logic that if the NextCol
or PrevCol
is the same as CurCol
then set CurCol = NULL
. Then you can collapse eliminate all the id records CurCol IS NULL
.
If you don't have the ability to use window aggregates or LEAD/LAG functions your task is a little more complex.
Hope this helps.
回答4:
Since id
is always sequential, with no gaps or repetitions, as per your comment, you could use the following method:
SELECT t1.*
FROM atable t1
LEFT JOIN atable t2 ON t1.id = t2.id + 1 AND t1.Col1 = t2.Col1
WHERE t2.id IS NULL
The table is (outer-)joined to itself on the condition that the left side's id
is one greater than the right side's and their Col1
values are identical. In other words, the condition is ‘the previous row contains the same Col1
value as the current row’. If there's no match on the right, then the current record should be selected.
UPDATE
To account for non-sequential id
s (which, however, are assumed to be unique and defining the order of changes of Col1
), you could also try the following query:
SELECT t1.*
FROM atable t1
LEFT JOIN atable t2 ON t1.id > t2.id
LEFT JOIN atable t3 ON t1.id > t3.id AND t3.id > t2.id
WHERE t3.id IS NULL
AND (t2.id IS NULL OR t2.Col1 <> t1.Col1)
The third self-join is there to ensure that the second one yields the row directly preceding that of t1
. That is, if there's no match for t3
, then either t2
contains the preceding row or it's got no match either, the latter meaning that t1
's current row is the top one.
来源:https://stackoverflow.com/questions/8683547/only-select-first-row-of-repeating-value-in-a-column-in-sql