问题
I am bulk importing data from a pipe-separated CSV file into SQL Server. The data is formatted like
A|B|CCCCCC\r\n
I have validated both that the file is in UTF-8 format and that lines are terminated with "\r\n" by viewing the CSV file in a hex editor.
The command is
BULK INSERT MyTable FROM 'C:\Path\File.csv'
WITH (FIRSTROW=1, MAXERRORS=0, BATCHSIZE=10000, FIELDTERMINATOR = '|',
ROWTERMINATOR = '\r\n')
The third column originally was defined as CHAR(6) as this field is always a code exactly 6 (ASCII) characters wide. That resulted in a truncation error during bulk insert.
I then widened the column to CHAR(8). The import worked, but
SELECT CAST(Col3 As VARBINARY(MAX))
indicates that the column data ends with 0x0D0A (or "\r\n", the row terminator)
Why is the row terminator being included in the imported data and how can I fix that?
回答1:
Long story short, SQL Server doesn't support UTF-8 and you just need \n as the row terminator.
It's actually a bit unclear what's going on because you didn't provide the table definition or the precise error messages. Having said all that, I could load the following data:
create table dbo.BCPTest (
col1 nchar(1) not null,
col2 nchar(1) not null,
col3 nchar(6) not null
)
/* This data can saved as ASCII, UTF-16 with BOM or UTF-8 without BOM
(see comments below)
A|B|CCCCCC
D|E|FFFFFF
*/
BULK INSERT dbo.BCPTest FROM 'c:\testfile.csv'
WITH (FIELDTERMINATOR = '|', ROWTERMINATOR = '\n')
Comments:
- When I created and saved a in Notepad as "UTF-8", it added the BOM bytes 0xEFBBBF which is the standard UTF-8 BOM
- But, SQL Server doesn't support UTF-8, it supports UTF-16 (offical docs here) and it expects a BOM of 0xFFFE
- So I saved the file again in Notepad as "Unicode", and it added the 0xFFFE BOM; this loaded fine as shown above. Out of curiosity I also saved it (using Notepad++) as "UTF-8 without BOM" and I could load that file too
- Saving the file as ASCII also loads fine with the same table data types and
BULK INSERTcommand - The row terminator should be
\nnot\r\nbecause\nis interpreted as a "newline", i.e. SQL Server (and/or Windows) is being 'clever' by interpreting\nsemantically instead of literally. This is most likely a result of the C handling of\rand\n, which doesn't require them to be interpreted literally.
来源:https://stackoverflow.com/questions/16312096/bulk-insert-includes-line-terminator