A few conundrums with replacing quotes and special characters in CSV files

删除回忆录丶 提交于 2021-01-27 18:24:43

问题


I am having a bit of a conundrum working with some CSV files that need to be cleansed and loaded into a database.

I am fairly adept with PowerShell, but poor with regular expressions, and csv column manipulation.

Here is the issue I am having; there is a 'notes' field in the CSV file I am working with, that can have all sorts of various characters. The main problem is that I need to remove the line feeds, and quotes WITHIN the field, but leave the regular line feeds and text qualifying quotes where they should be. I can remove the line feeds and quotes throughout the file, but not specifically down to the characters within the field.

I have tried working with regular expressions to do this, but am not having much luck, and honestly, I am not that adept with regular expressions. I am hoping someone here will be able to help with this!

Edit: here is the example data

"123"   ""  "2017-02-13 10:26:08" "123456789"   "2017-02-10"    "No"    "Yes"   "Yes"   "No"    "sa‌​mple text 
<crlf> ""additional text""
<crlf> 
<crlf> "    "Y" <crlf>

this should simply be one line with no except at the end.


回答1:


The built-in Import-Csv cmdlet correctly imports multiline and quoted values.

Your file is tab-delimited so we'll specify "`t":

Import-Csv c:\file.csv -Delimiter "`t" | ForEach {
    $_.notes = $_.notes -replace '"', '' -replace '[\r\n]+', ' '
    $_
} | Export-Csv c:\output.csv -Delimiter "`t" -NoTypeInformation -Encoding UTF8


来源:https://stackoverflow.com/questions/42308539/a-few-conundrums-with-replacing-quotes-and-special-characters-in-csv-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!