How to bulk insert from CSV when some fields have new line character?

梦想与她 提交于 2019-12-22 08:17:53

问题


I have a CSV dump from another DB that looks like this (id, name, notes):

1001,John Smith,15 Main Street
1002,Jane Smith,"2010 Rockliffe Dr.
Pleasantville, IL
USA"
1003,Bill Karr,2820 West Ave.

The last field may contain carriage returns and commas, in which case it is surrounded by double quotes. And I need to preserve those returns and commas.

I use this code to import CSV into my table:

BULK INSERT CSVTest
FROM 'c:\csvfile.csv'
WITH
(
  FIELDTERMINATOR = ',',
  ROWTERMINATOR = '\n'
)

SQL Server 2005 bulk insert cannot figure out that carriage returns inside quotes are not row terminators.
How to overcome?


UPDATE:
Looks like the only way to keep line breaks inside a field is to use different row separator. So, I want to mark all row separating line breaks by putting a pipe in front of them. How can I change my CSV to look like this?

1001,John Smith,15 Main Street|
1002,Jane Smith,"2010 Rockliffe Dr.
Pleasantville, IL
USA"|
1003,Bill Karr,2820 West Ave.|


回答1:


Bulk operations on SQL Server do not specifically support CSV even though they can import them if the files are carefully formatted. My suggestion would be to enclose all field values in quotes. BULK INSERT might then allow the carriage returns within a field value. If it does not, then your next solution might be an Integration Services package.

See Preparing Data for Bulk Export or Import for more.




回答2:


you can massage these line breaks into one line with a script, eg you can use GNU sed to remove line breaks. eg

$ more file
1001,John Smith,15 Main Street
1002,Jane Smith,"2010 Rockliffe Dr.
Pleasantville, IL
USA"
1003,Bill Karr,"2820
West Ave"

$ sed '/"/!s/$/|/;/.*\".*[^"]$/{ :a;N };/"$/ { s/$/|/ }' file
1001,John Smith,15 Main Street|
1002,Jane Smith,"2010 Rockliffe Dr.
Pleasantville, IL
USA"|
1003,Bill Karr,"2820
West Ave"|

then you can bulk insert.

Edit:

Save this :/"/!s/$/|/;/.*\".*[^"]$/{ :a;N };/"$/ { s/$/|/ } in a file , say myformat.sed. then do this on the command line

c:\test> sed.exe -f myformat.sed myfile




回答3:


You cannot import this unless the CSV is in valid format. So, you have to either fix the dump or manually using search & replace fix the unwanted new line characters.




回答4:


According to the source of all knowledge (Wikipedia), csv uses new lines to separate records. So what you have is not valid csv.

My suggestion is that you write a perl program to process your file and add each record to the db.

If you're not a perl person, then you could use a programming site or see if some kind SO person will write the parsing section of the program for you.

Added:

Possible Solution

Since the OP states that he can change the input file, I'd change all the new lines that do not follow a " to be a reserved char sequence, eg XXX

This can be an automated replacement in many editors. In Windows, UltraEdit includes regexp find/replace functionality

Then import into the dbms since you'll no longer have the embedded new lines.

Then use SQL Replace to change the XXX occurrences back into new lines.




回答5:


If you have control over the contents of the CSV file, you could replace the in-field line breaks (CRLF) with a non-linebreak character (perhaps just CR or LF), then run a script after the import to replace them with CRLF again.

This is how MS Office products (Excel, Access) deal with this problem.




回答6:


OK, here's a small Java program that I end up writing to solve the problem.
Comments, corrections and optimizations are welcome.

import java.io.*;

public class PreBulkInsert
{
    public static void main(String[] args)
    {
        if (args.length < 3)
        {
            System.out.println ("Usage:");
            System.out.println ("  java PreBulkInsert input_file output_file separator_character");
            System.exit(0);
        }

        try
        {
            boolean firstQuoteFound = false;
            int fromIndex;
            int lineCounter = 0;
            String str;

            BufferedReader in = new BufferedReader(new FileReader(args[0]));
            BufferedWriter out = new BufferedWriter(new FileWriter(args[1])); 
            String newRowSeparator = args[2];

            while ((str = in.readLine()) != null)
            {
                fromIndex = -1;
                do
                {
                    fromIndex = str.indexOf('"', fromIndex + 1);
                    if (fromIndex > -1)
                        firstQuoteFound = !firstQuoteFound;
                } while (fromIndex > -1);

                if (!firstQuoteFound)
                    out.write(str + newRowSeparator + "\r\n");
                else
                    out.write(str + "\r\n");
                lineCounter++;
            }
            out.close();
            in.close();
            System.out.println("Done! Total of " + lineCounter + " lines were processed.");
        }
        catch (IOException e)
        {
            System.out.println(e.getMessage());
            System.exit(1);
        }       
    }
}


来源:https://stackoverflow.com/questions/2512924/how-to-bulk-insert-from-csv-when-some-fields-have-new-line-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!