Bulk insert, SQL Server 2000, unix linebreaks

寵の児 提交于 2019-11-28 18:12:58

I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.

As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.

What did work very slick was: with (ROWTERMINATOR = '0x0a')

Problem solved with Hex!

Hope this helps someone.

Thanks to all who have answered but I found my preferred solution.

When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.

DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'BULK INSERT table_name
FROM ''C:\file.csv''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC (@bulk_cmd)

Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.

What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.

I confirm that the syntax

ROWTERMINATOR = '''+CHAR(10)+'''

works when used with an EXEC command.

If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:

ROWTERMINATOR = '''+CHAR(124)+''+CHAR(10)+'''

It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!

One option would be to use bcp, and set up a control file with '\n' as the line break character.

Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with '\r\n' line breaks.

Finally, you can use the FORMATFILE option on BULK INSERT. This will use a bcp control file to specify the import format.

Looks to me there are two general avenues that can be taken: some alternate way to read the CSV in the SQL script or convert the CSV beforehand with any of the numerous ways you can do that (bcp, unix2dos, if it is a one-time king of a thing, you can probably even use your code editor to fix the file for you).

But you will have to have an extra step!

If this SQL is launched from a program, you might want to convert the line endings in that program. In that case and you decide to code the conversion yourself, here is what you need to watch out for: 1. The line ending might be \n 2. or \r\n 3. or even \r (Mac!) 4. good grief, it could be that some lines have \r\n and others \n, any combination is possible unless you control where the CSV came from

OK, OK. Possibility 4 is farfetched. It happens in email, but that is another story.

I would think "ROWTERMINATOR = '\n'" would work. I would suggest opening the file in a tool that shows "hidden characters" to make sure the line is being terminated like you think. I use notepad++ for things like this.

It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).

When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).

Why can't you do:

(ROWTERMINATOR = CHAR(10))

Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:

(ROWTERMINATOR =
     )

What happens when you echo out the contents of @bulk_cmd?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!