Replace double quotes (within qualifiers) in CSV for SSIS import

大城市里の小女人 提交于 2019-12-22 00:34:18

问题


I have a SSIS package importing data from a .csv file. This file has doulbe quotes (") qualifiers for each entry in it but also in between. I also added commas (,) as a column delimiter. I can't give you the original data I'm working with but here is an example how my data is passed in Flat File Source:

"ID-1","A "B"", C, D, E","Today"
"ID-2","A, B, C, D, E,F","Yesterday"
"ID-3","A and nothing else","Today"

As you can see the second column can contain quotes (and commas) which smashes my SSIS import with an error pointing at this line. I'm not really familiar with regular expressions, but I've heard that this might help in this case.

In my eyes I need to replace all the double quotes (") by single quotes (') except...

  • ...all quotes at the beginning of one line
  • ...all quotes at the end of one line
  • ...quotes which are part of ","

Can anyone of you help me out in this thing? Would be great!

Thanks in advance!


回答1:


To replace double quotes with single quotes according to your specifications, use this simple regex. This regex will allow whitespace at the beginning and/or end of lines.

string pattern = @"(?<!^\s*|,)""(?!,""|\s*$)";
string resultString = Regex.Replace(subjectString, pattern, "'", RegexOptions.Multiline);

This is the explanation of the pattern:

// (?<!^\s*|,)"(?!,"|\s*$)
// 
// Options: ^ and $ match at line breaks
// 
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!^\s*|,)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «^\s*»
//       Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
//       Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
//       Match the character “,” literally «,»
// Match the character “"” literally «"»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!,"|\s*$)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «,"»
//       Match the characters “,"” literally «,"»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «\s*$»
//       Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
//          Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//       Assert position at the end of a line (at the end of the string or before a line break character) «$»



回答2:


You can split columns with regex match pattern

/(?:(?<=^")|(?<=",")).*?(?:(?="\s*$)|(?=","))/g

See this demo.




回答3:


while loading CSV with double quotes and comma there is one limitation that extra double quotes has been added and the data also enclosed with the double quotes you can check in the preview of source file. So, add the derived column task and give the below expression:-

(REPLACE(REPLACE(RIGHT(SUBSTRING(TRIM(COL2),1,LEN(COL2) - 1),LEN(COL2) - 2)," ","@"),"\"\"","\""),"@"," ")

the bold part removes the data enclosed with double quotes.

Try this and do let me know if this is helpful




回答4:


Use text qualifier " for CSV destination before inserting values to CSV destination, add a derived column expression

REPLACE(REPLACE([Column1],",",""),"\"","")

This will retain " in your text field



来源:https://stackoverflow.com/questions/12320123/replace-double-quotes-within-qualifiers-in-csv-for-ssis-import

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!