cleaning up db of redundant data

前端 未结 5 1647
时光说笑
时光说笑 2020-12-07 04:54
locid   country city
39409   US  Aaronsburg
128426  US  Aaronsburg
340356  US  Aaronsburg
429373  US  Aaronsburg
422717  US  Abbeville
431344  US  Abbeville
433062           


        
相关标签:
5条回答
  • 2020-12-07 05:13

    Add unique index on table location so that no duplicate records will get inserted

    ALTER IGNORE TABLE location ADD UNIQUE KEY ix1(country, city);
    

    This will automatically remove duplicate records from the table and for future insert queries you need to use INSERT IGNORE clause to avoid getting duplicate errors.

    but as suggested by @AD7six in comments, it might not work on MySQL versions 5.1.41,5.5.1-m2, 6.0: see bug here

    or alternate safe way to remove duplicates using DELETE query:

    DELETE a
    FROM location a
         LEFT JOIN (
                    SELECT locid
                    FROM location
                    GROUP BY country, city
                   )b
                   ON a.locid = b.locid
    WHERE b.locid IS NULL;
    

    to resettle values of auto_increment column locid, you can just drop the primary key on locid and recreate it:

    ALTER TABLE location DROP column locid;
    ALTER TABLE location 
          ADD COLUMN locid INT unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
    

    or alternative way to resettle values of locid using UPDATE query:

    SET var_locid = 0;
    
    UPDATE location
    SET locid = (@var_locid := @var_locid + 1)
    ORDER BY locid ASC;
    
    0 讨论(0)
  • 2020-12-07 05:20

    Create a new table with new auto_increment field and just select them with GROUP BY into the new table

    Not tested but should look like this:

    INSERT INTO new_table(country, city) 
    SELECT country, city FROM old_table 
    GROUP BY country,city
    

    EDIT: You could drop the old_table and rename the new_table afterwards.

    0 讨论(0)
  • 2020-12-07 05:24

    You can do this in several - each simple - steps.

    Backup your original table

    If you haven't already - back up your original table data.

    Create a temporary table

    Create a new table, which you are going to use to replace your original table. Here's an example:

    CREATE TABLE temporary (
      locid INTEGER(10) UNSIGNED NOT NULL AUTO_INCREMENT,
      country VARCHAR(255) DEFAULT '',
      city VARCHAR(255) DEFAULT '',
      PRIMARY KEY  (locid),
      UNIQUE KEY  (country, city)
    );
    

    The schema should be almost the same as your existing table the note-worthy differences are:

    • Auto increment primary key
    • A unique country+city index

    Import your old data

    INSERT IGNORE INTO temporary (country, city) SELECT country, city FROM original_table_name;
    

    This will populate your temporary table with unique country+city combinations. Each row will be assigned an auto-increment value - i.e. it will start with 1.

    Check results

    Have a look at your data and make sure it looks like you want it:

    SELECT * FROM temporary;
    

    If anything is amiss - drop the table temporary adjust the sql you are running and start again.

    Replace your original table with your new one

    Once you are happy with what you see in your temporary table:

    DROP TABLE original_table_name; -- Or rename it to something else
    RENAME TABLE temporary TO original_table_name;
    

    You now have a table with unique data and sequential ids starting with 1.

    Other options

    You can also just apply a unique index to country+city, drop the primary key field, and then re-add it as an autoincrement. Be aware that mysql may ignore the ignore flag when creating indexes, though there's a workaround for that.

    I'd do that personally, but if you're not confident with sql - doing things one step at a time, and without destroying your source data in the process, can make updating your schema a less worrying task.

    0 讨论(0)
  • 2020-12-07 05:33

    delete these recordds

    select T2.* from ( 
         select country city,max(locid)locid
         from <table>
         group by country city)T1
         join
         select * from <table> T2
         where T2.locid<>T1.locid
    
    0 讨论(0)
  • 2020-12-07 05:33
    1. Select the unique records and insert into another temporary table of the same schema.
    2. delete everyting from his table
    3. Select and insert back in from the temporary table.
    4. Remove temporary table
    0 讨论(0)
提交回复
热议问题