How to check if a value already exists to avoid duplicates?

前端 未结 17 1045
小鲜肉
小鲜肉 2020-12-02 23:13

I\'ve got a table of URLs and I don\'t want any duplicate URLs. How do I check to see if a given URL is already in the table using PHP/MySQL?

17条回答
  •  情歌与酒
    2020-12-02 23:43

    First things first. If you haven't already created the table, or you created a table but do not have data in in then you need to add a unique constriant, or a unique index. More information about choosing between index or constraints follows at the end of the post. But they both accomplish the same thing, enforcing that the column only contains unique values.

    To create a table with a unique index on this column, you can use.

    CREATE TABLE MyURLTable(
    ID INTEGER NOT NULL AUTO_INCREMENT
    ,URL VARCHAR(512)
    ,PRIMARY KEY(ID)
    ,UNIQUE INDEX IDX_URL(URL)
    );
    

    If you just want a unique constraint, and no index on that table, you can use

    CREATE TABLE MyURLTable(
    ID INTEGER NOT NULL AUTO_INCREMENT
    ,URL VARCHAR(512)
    ,PRIMARY KEY(ID)
    ,CONSTRAINT UNIQUE UNIQUE_URL(URL)
    );
    

    Now, if you already have a table, and there is no data in it, then you can add the index or constraint to the table with one of the following pieces of code.

    ALTER TABLE MyURLTable
    ADD UNIQUE INDEX IDX_URL(URL);
    
    ALTER TABLE MyURLTable
    ADD CONSTRAINT UNIQUE UNIQUE_URL(URL);
    

    Now, you may already have a table with some data in it. In that case, you may already have some duplicate data in it. You can try creating the constriant or index shown above, and it will fail if you already have duplicate data. If you don't have duplicate data, great, if you do, you'll have to remove the duplicates. You can see a lit of urls with duplicates using the following query.

    SELECT URL,COUNT(*),MIN(ID) 
    FROM MyURLTable
    GROUP BY URL
    HAVING COUNT(*) > 1;
    

    To delete rows that are duplicates, and keep one, do the following:

    DELETE RemoveRecords
    FROM MyURLTable As RemoveRecords
    LEFT JOIN 
    (
    SELECT MIN(ID) AS ID
    FROM MyURLTable
    GROUP BY URL
    HAVING COUNT(*) > 1
    UNION
    SELECT ID
    FROM MyURLTable
    GROUP BY URL
    HAVING COUNT(*) = 1
    ) AS KeepRecords
    ON RemoveRecords.ID = KeepRecords.ID
    WHERE KeepRecords.ID IS NULL;
    

    Now that you have deleted all the records, you can go ahead and create you index or constraint. Now, if you want to insert a value into your database, you should use something like.

    INSERT IGNORE INTO MyURLTable(URL)
    VALUES('http://www.example.com');
    

    That will attempt to do the insert, and if it finds a duplicate, nothing will happen. Now, lets say you have other columns, you can do something like this.

    INSERT INTO MyURLTable(URL,Visits) 
    VALUES('http://www.example.com',1)
    ON DUPLICATE KEY UPDATE Visits=Visits+1;
    

    That will look try to insert the value, and if it finds the URL, then it will update the record by incrementing the visits counter. Of course, you can always do a plain old insert, and handle the resulting error in your PHP Code. Now, as for whether or not you should use constraints or indexes, that depends on a lot of factors. Indexes make for faster lookups, so your performance will be better as the table gets bigger, but storing the index will take up extra space. Indexes also usually make inserts and updates take longer as well, because it has to update the index. However, since the value will have to be looked up either way, to enforce the uniqueness, in this case, It may be quicker to just have the index anyway. As for anything performance related, the answer is try both options and profile the results to see which works best for your situation.

提交回复
热议问题