I am trying to insert a .csv file into a database with unix linebreaks. The command I am running is:
BULK INSERT table_name
FROM \'C:\\file.csv\'
WITH
(
It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).
When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).
Why can't you do:
(ROWTERMINATOR = CHAR(10))
Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:
(ROWTERMINATOR =
)
What happens when you echo out the contents of @bulk_cmd?
Thanks to all who have answered but I found my preferred solution.
When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.
DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'BULK INSERT table_name
FROM ''C:\file.csv''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC (@bulk_cmd)
Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.
What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.
One option would be to use bcp, and set up a control file with '\n'
as the line break character.
Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with '\r\n'
line breaks.
Finally, you can use the FORMATFILE
option on BULK INSERT
. This will use a bcp control file to specify the import format.
I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.
As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.
What did work very slick was: with (ROWTERMINATOR = '0x0a')
Problem solved with Hex!
Hope this helps someone.
I confirm that the syntax
ROWTERMINATOR = '''+CHAR(10)+'''
works when used with an EXEC command.
If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:
ROWTERMINATOR = '''+CHAR(124)+''+CHAR(10)+'''
It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!