MySQL consolidate duplicate data records via UPDATE / DELETE

你说的曾经没有我的故事 提交于 2020-01-03 16:15:10

问题


I have a table which looks like this:

mysql> SELECT * FROM Colors;
╔════╦══════════╦════════╦════════╦════════╦════════╦════════╦════════╗
║ ID ║ USERNAME ║  RED   ║ GREEN  ║ YELLOW ║  BLUE  ║ ORANGE ║ PURPLE ║
╠════╬══════════╬════════╬════════╬════════╬════════╬════════╬════════╣
║  1 ║ joe      ║ 1      ║ (null) ║ 1      ║ (null) ║ (null) ║ (null) ║
║  2 ║ joe      ║ 1      ║ (null) ║ (null) ║ (null) ║ 1      ║ (null) ║
║  3 ║ bill     ║ 1      ║ 1      ║ 1      ║ (null) ║ (null) ║ 1      ║
║  4 ║ bill     ║ (null) ║ 1      ║ (null) ║ 1      ║ (null) ║ (null) ║
║  5 ║ bill     ║ (null) ║ 1      ║ (null) ║ (null) ║ (null) ║ (null) ║
║  6 ║ bob      ║ (null) ║ (null) ║ (null) ║ 1      ║ (null) ║ (null) ║
║  7 ║ bob      ║ (null) ║ (null) ║ (null) ║ (null) ║ (null) ║ 1      ║
║  8 ║ bob      ║ 1      ║ (null) ║ (null) ║ (null) ║ (null) ║ (null) ║
╚════╩══════════╩════════╩════════╩════════╩════════╩════════╩════════╝

I would like to run an UPDATE and DELETE which would find and remove duplicates and consolidate the records such that we would end with this as the result.

mysql> SELECT * FROM Colors;
╔════╦══════════╦═════╦════════╦════════╦════════╦════════╦════════╗
║ ID ║ USERNAME ║ RED ║ GREEN  ║ YELLOW ║  BLUE  ║ ORANGE ║ PURPLE ║
╠════╬══════════╬═════╬════════╬════════╬════════╬════════╬════════╣
║  1 ║ joe      ║   1 ║ (null) ║ 1      ║ (null) ║ 1      ║ (null) ║
║  3 ║ bill     ║   1 ║ 1      ║ 1      ║ 1      ║ (null) ║ 1      ║
║  6 ║ bob      ║   1 ║ (null) ║ (null) ║ 1      ║ (null) ║ 1      ║
╚════╩══════════╩═════╩════════╩════════╩════════╩════════╩════════╝

I know I could easily do this with a script, but in the interest of learning and understanding MySQL better I would like to learn how to do this using pure SQL.


回答1:


This is only a projection. It doesn't update the table nor delete some data.

SELECT  MIN(ID) ID,
        Username,
        MAX(Red) max_Red,
        MAX(Green) max_Green,
        MAX(Yellow) max_Yellow,
        MAX(Blue) max_Blue,
        MAX(Orange) max_Orange,
        MAX(Purple) max_Purple
FROM    Colors
GROUP   BY Username
  • SQLFiddle Demo

UPDATE

if you really want to delete those records, you need to run UPDATE statement first before you can delete the records

UPDATE  Colors a
        INNER JOIN
        (
            SELECT  MIN(ID) min_ID,
                    Username,
                    MAX(Red) max_Red,
                    MAX(Green) max_Green ,
                    MAX(Yellow) max_Yellow,
                    MAX(Blue) max_Blue,
                    MAX(Orange) max_Orange,
                    MAX(Purple) max_Purple
            FROM    Colors
            GROUP   BY Username
        ) b ON a.ID = b.Min_ID 
SET     a.Red = b.max_Red,
        a.Green = b.max_Green,
        a.Yellow = b.max_Yellow,
        a.Blue = b.max_Blue,
        a.Orange = b.max_Orange,
        a.Purple = b.max_Purple

Then you can now delete the records,

DELETE  a
FROM    Colors a
        LEFT JOIN
        (
            SELECT  MIN(ID) min_ID,
                    Username
            FROM    Colors
            GROUP   BY Username
        ) b ON a.ID = b.Min_ID 
WHERE   b.Min_ID  IS NULL
  • SQLFiddle Demo



回答2:


Do you really need to update the underlying table? If not (and you simply want the resultset as shown in your example), you could simply group the table:

SELECT   MIN(ID)     AS ID,
         Username    AS Username,
         MAX(Red)    AS Red,
         MAX(Green)  AS Green,
         MAX(Yellow) AS Yellow,
         MAX(Blue)   AS Blue,
         MAX(Orange) AS Orange,
         MAX(Purple) AS Purple
FROM     Colors
GROUP BY Username

See it on sqlfiddle.




回答3:


DELETE FROM Colors c1
WHERE EXISTS (SELECT 1
                FROM Colors c2
               WHERE c1.Username = c2.Username
                 AND ((c1.Red    IS NULL AND c2.Red    IS NULL) OR c1.Red    = c2.Red   )
                 AND ((c1.Green  IS NULL AND c2.Green  IS NULL) OR c1.Green  = c2.Green )
                 AND ((c1.Yellow IS NULL AND c2.Yellow IS NULL) OR c1.Yellow = c2.Yellow)
                 AND ((c1.Blue   IS NULL AND c2.Blue   IS NULL) OR c1.Blue   = c2.Blue  )
                 AND ((c1.Orange IS NULL AND c2.Orange IS NULL) OR c1.Orange = c2.Orange)
                 AND ((c1.Purple IS NULL AND c2.Purple IS NULL) OR c1.Purple = c2.Purple)
                 AND c2.ID < c1.ID
             )

The nulls make this a bit more complex, as NULL = NULL is not true but unknown in SQL. If you had 0 and 1, the part before the OR in the color conditions could be omitted.



来源:https://stackoverflow.com/questions/14404259/mysql-consolidate-duplicate-data-records-via-update-delete

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!