I have some very large files (millions of records) that I need to load into a database. They are of the form:
word1\tblahblahblah
word2\tblahblah
word3\tblahblah
word4
word5\tblahblah
...
My problem, is that I want to ignore the lines that have no second record (the 'blahblah's'), like word4.
I'm currently using the following query to load the file:
LOAD DATA LOCAL INFILE 'file'
IGNORE INTO TABLE tablename
COLUMNS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
(col1, col2);
This has the functionality I want, except that it still accepts the null values. Is there a way to skip the word4 type lines and not add them to the DB?
Intuitively, I thought using a WHEN or WHERE clause would be perfect, but looking through documentation and other info online, I can't find any examples where this would work. Am I forced to filter these records out before hand, or can I do it within this query?
Any help is grealy appreciated!
I would do this by filtering the file with grep
or awk
and then piping it into MySQL (via /dev/stdin
). Something like this:
cat your_file.txt |
awk '/\t.+/' |
mysql -u your_username -pyour_password \
-e "LOAD DATA LOCAL INFILE '/dev/stdin' \
IGNORE INTO TABLE tablename \
COLUMNS TERMINATED BY '\t' \
LINES TERMINATED BY '\n' \
(col1, col2);" \
your_database_name
The regular expression given to awk
on the second line just matches any line that has a tab character followed by one or more of any character. You might want to tweak it to fit your needs.
Edit: One other possibility occurred to me. You could use SET
to set some magic value on columns that are blank and put a BEFORE INSERT
trigger on the table that will bail on a row when it sees that value. I don't have much experience with triggers but I think something like this should work:
CREATE TRIGGER skip_magic_rows
BEFORE INSERT ON tablename
FOR EACH ROW
BEGIN
IF NEW.col2 = 'IDSPISPOPD4815162342' THEN # Some unlikely magic string
# Trigger an error, which will cause the INSERT to fail†
# If you have MySQL < 5.5 this is kludgy -- see Note 1
DROP TABLE `Skipped row`
# OR
# In MySQL >= 5.5 you can send a signal--'45000' is a generic error
SIGNAL SQLSTATE '45000' SET message_text = 'Skipped row'; # See Note 2
END IF
END
;
†: Per the docs:
An error during either a BEFORE or AFTER trigger results in failure of the entire statement that caused trigger invocation.
Then...
LOAD DATA LOCAL INFILE 'file'
IGNORE INTO TABLE tablename
COLUMNS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
(col1, @var2)
SET col2 = IF(@var2 IN (NULL, ''), 'IDSPISPOPD4815162342', @var2)
;
Hope it's helpful!
Note 1: Relevant blog post and comments
Note 2: Relevant SO thread
来源:https://stackoverflow.com/questions/7587660/filter-null-or-empty-input-using-load-data-infile-in-mysql