Remove duplicates using only a MySQL query?

后端 未结 7 1184
死守一世寂寞
死守一世寂寞 2020-11-27 07:55

I have a table with the following columns:

URL_ID    
URL_ADDR    
URL_Time

I want to remove duplicates on the URL_ADDR column

7条回答
  •  [愿得一人]
    2020-11-27 08:12

    Consider the following test case:

    CREATE TABLE mytb (url_id int, url_addr varchar(100));
    
    INSERT INTO mytb VALUES (1, 'www.google.com');
    INSERT INTO mytb VALUES (2, 'www.microsoft.com');
    INSERT INTO mytb VALUES (3, 'www.apple.com');
    INSERT INTO mytb VALUES (4, 'www.google.com');
    INSERT INTO mytb VALUES (5, 'www.cnn.com');
    INSERT INTO mytb VALUES (6, 'www.apple.com');
    

    Where our test table now contains:

    SELECT * FROM mytb;
    +--------+-------------------+
    | url_id | url_addr          |
    +--------+-------------------+
    |      1 | www.google.com    |
    |      2 | www.microsoft.com |
    |      3 | www.apple.com     |
    |      4 | www.google.com    |
    |      5 | www.cnn.com       |
    |      6 | www.apple.com     |
    +--------+-------------------+
    5 rows in set (0.00 sec)
    

    Then we can use the multiple-table DELETE syntax as follows:

    DELETE t2
    FROM   mytb t1
    JOIN   mytb t2 ON (t2.url_addr = t1.url_addr AND t2.url_id > t1.url_id);
    

    ... which will delete duplicate entries, leaving only the first url based on url_id:

    SELECT * FROM mytb;
    +--------+-------------------+
    | url_id | url_addr          |
    +--------+-------------------+
    |      1 | www.google.com    |
    |      2 | www.microsoft.com |
    |      3 | www.apple.com     |
    |      5 | www.cnn.com       |
    +--------+-------------------+
    3 rows in set (0.00 sec)
    

    UPDATE - Further to new comments above:

    If the duplicate URLs will not have the same format, you may want to apply the REPLACE() function to remove www. or http:// parts. For example:

    DELETE t2
    FROM   mytb t1
    JOIN   mytb t2 ON (REPLACE(t2.url_addr, 'www.', '') = 
                       REPLACE(t1.url_addr, 'www.', '') AND 
                       t2.url_id > t1.url_id);
    

提交回复
热议问题