MySQL LOAD DATA INFILE: works, but unpredictable line terminator

只谈情不闲聊 提交于 2019-11-29 01:04:24

问题


MySQL has a nice CSV import function LOAD DATA INFILE.

I have a large dataset that needs to be imported from CSV on a regular basis, so this feature is exactly what I need. I've got a working script that imports my data perfectly.

.....except.... I don't know in advance what the end-of-line terminator will be.

My SQL code currently looks something like this:

LOAD DATA INFILE '{fileName}'
 INTO TABLE {importTable}
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 LINES TERMINATED BY '\n'
 IGNORE 1 LINES
( {fieldList} );

This works great for some import files.

However, the import data is coming from multiple sources. Some of them have the \n terminator; others have \r\n. I can't predict which one I'll have.

Is there a way using LOAD DATA INFILE to specify that my lines may be terminated with either \n or \r\n? How do I deal with this?


回答1:


I'd just pre-process it. A global search/replace to change \r\n to \n done from a command line tool as part of the import process should be simple and performant.




回答2:


You can specify line separator as '\n' and remove trailing '\r' separators if necessary from the last field during loading.

For example -

Suppose we have the 'entries.txt' file. The line separator is '\r\n', and only after line ITEM2 | CLASS3 | DATE2 the separator is '\n':

COL1  | COL2   | COL3
ITEM1 | CLASS1 | DATE1
ITEM2 | CLASS3 | DATE2
ITEM3 | CLASS1 | DATE3
ITEM4 | CLASS2 | DATE4

CREATE TABLE statement:

CREATE TABLE entries(
  column1 VARCHAR(255) DEFAULT NULL,
  column2 VARCHAR(255) DEFAULT NULL,
  column3 VARCHAR(255) DEFAULT NULL
)

Our LOAD DATA INFILE query:

LOAD DATA INFILE 'entries.txt' INTO TABLE entries
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(column1, column2, @var)
SET column3 = TRIM(TRAILING '\r' FROM @var);

Show results:

SELECT * FROM entries;
+---------+----------+---------+
| column1 | column2  | column3 |
+---------+----------+---------+
| ITEM1   |  CLASS1  |  DATE1  |
| ITEM2   |  CLASS3  |  DATE2  |
| ITEM3   |  CLASS1  |  DATE3  |
| ITEM4   |  CLASS2  |  DATE4  |
+---------+----------+---------+



回答3:


I assuming the you need information only through mysql no by any programming language. Before use load data covert the format to windows format \r\n ( CR LF ) if u have notepad++. And then process the Load data query. Make sure the LINES TERMINATED BY '\r\n'

Edit:

Since the editors are often unsuitable for converting larger files. For larger files the following command is often used both windows and linux

1) To convert into windows format in windows

TYPE [unix_file] | FIND "" /V > dos_file

2) To convert into windows format in linux

unix2dos  [file]

The other commands also available

A windows format file can be converted to Unix format by simply removing all ASCII CR \r characters by tr -d '\r' < inputfile > outputfile

grep -PL $'\r\n' myfile.txt # show UNIX format  style file (LF terminated)
grep -Pl $'\r\n' myfile.txt # show WINDOS format style file (CRLF terminated)

In linux/unix the file command detects the type of End-Of-Line (EOL) used. So the file type can be checked using this command




回答4:


You could also look into one of the data integration packages out there. Talend Open Studio has very flexible data input routines. For example you could process the file with one set of delimiters and catch the rejects and process them another way.




回答5:


If the first load has 0 rows, do the same statement with the other line terminator. This should be do-able with some basic counting logic.

At least it stays all in SQL, and if it works the first time you win. And could cause less headache that re-scanning all the rows and removing a particular character.




回答6:


Why not first just take a peek at how the lines end?

$handle = fopen('inputFile.csv', 'r');

$i = 0;
if ($handle) {
    while (($buffer = fgets($handle)) !== false) {

        $s =  substr($buffer,-50);

        echo $s; 
        echo preg_match('/\r/', $s) ? 'cr ' : '-- ';
        echo preg_match('/\n/', $s) ? 'nl<br>' : '--<br>';          

        if( $i++ > 5)
            break;

    }

    fclose($handle);
}



回答7:


You can use LINES STARTING to separate usual line endings in text and a new row:

LOAD DATA LOCAL INFILE '/home/laptop/Downloads/field3-utf8.csv' 
IGNORE INTO TABLE Field FIELDS 
TERMINATED BY ';' 
OPTIONALLY ENCLOSED BY '^' 
LINES STARTING BY '^' 
TERMINATED BY '\r\n' 
(Id, Form_id, Name, Value)

For usual CSV files with " enclosing chars, it will be:

...
LINES STARTING BY '"' 
...


来源:https://stackoverflow.com/questions/10935219/mysql-load-data-infile-works-but-unpredictable-line-terminator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!