mysql duplicates with LOAD DATA INFILE

前端 未结 2 664
自闭症患者
自闭症患者 2020-12-11 12:41

When using LOAD DATA INFILE, is there a way to either flag a duplicate row, or dump any/all duplicates into a separate table?

相关标签:
2条回答
  • 2020-12-11 13:21

    From the LOAD DATE INFILE documentation:

    The REPLACE and IGNORE keywords control handling of input rows that duplicate existing rows on unique key values:

    • If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 12.2.7, “REPLACE Syntax”.
    • If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether the LOCAL keyword is specified. Without LOCAL, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With LOCAL, the default behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.

    Effectively, there's no way to redirect the duplicate records to a different table. You'd have to load them all in, and then create another table to hold the non-duplicated records.

    0 讨论(0)
  • 2020-12-11 13:38

    It looks as if there actually is something you can do when it comes to duplicate rows for LOAD DATA calls. However, the approach that I've found isn't perfect: it acts more as a log for all deletes on a table, instead of just for LOAD DATA calls. Here's my approach:

    Table test:

    CREATE TABLE test (
        id INTEGER PRIMARY KEY,
        text VARCHAR(255) DEFAULT NULL
    );
    

    Table test_log:

    CREATE TABLE test_log (
        id INTEGER, -- not primary key, we want to accept duplicate rows
        text VARCHAR(255) DEFAULT NULL,
        time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );
    

    Trigger del_chk:

    delimiter //
    drop trigger if exists del_chk;
    CREATE TRIGGER del_chk AFTER DELETE ON test
    FOR EACH ROW
    BEGIN
        INSERT INTO test_log(id,text) values(OLD.id,OLD.text);
    END;//
    delimiter ;
    

    Test import (/home/user/test.csv):

    1,asdf
    2,jkl
    3,qwer
    1,tyui
    1,zxcv
    2,bnm
    

    Query:

    LOAD DATA INFILE '/home/ken/test.csv'
    REPLACE INTO TABLE test 
    FIELDS 
        TERMINATED BY ','
    LINES
        TERMINATED BY '\n' (id,text);
    

    Running the above query will result in 1,asdf, 1,tyui, and 2,jkl being added to the log table. Based on a timestamp, it could be possible to associate the rows with a particular LOAD DATA statement.

    0 讨论(0)
提交回复
热议问题