问题
I'm finding myself somewhat stumpped on a simple problem. I'm trying to remove fancy quoting from a bunch of text files. I've the following script, where I'm trying a number of different replacement methods, but w/o results.
Here's an example that downloads the data from github and attempts to convert.
$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"
$c = Get-Content "foo.txt"
$c | % { `
$_ = $_.Replace("’","'")
$_ = $_.Replace("`“","`"")
$_.Replace("`”","`"")
} `
| Set-Content "foo2.txt"
What's the trick for this to work?
回答1:
UPDATE: Fixed my answer (manojlds comments were correct, the $_ thing was a red herring). Here's a version that works, and I've updated it to incorporate your testing code:
$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")
$fancySingleQuotes = "[\u2019\u2018]"
$fancyDoubleQuotes = "[\u201C\u201D]"
$c = Get-Content "foo.txt" -Encoding UTF8
$c | % { `
$_ = [regex]::Replace($_, $fancySingleQuotes, "'")
[regex]::Replace($_, $fancyDoubleQuotes, '"')
} `
| Set-Content "foo2.txt"
The reason that manojlds version wasn't working for you is that the encoding on the file you're getting from github wasn't compatible with the Unicode characters in the regex. Reading it in as UTF-8 fixes the problem.
回答2:
The following works on the input and output that you had given:
$c = Get-Content $file
$c | % { `
$_ = $_.Replace("’","'")
$_ = $_.Replace("`“","`"")
$_.Replace("`”","`"")
} `
| Set-Content $file
回答3:
Your last replace, pleaces a left fancy quote with and single quote. Is that what you want? it doesn't match your sample output. Try this:
$_.Replace("`“","`"")
$_.Replace("`”","`"")
回答4:
This article is so close to what I need. I was looking for something that would check for any UTF8 and found this article: Notepad++, How to remove all non ascii characters with regex? Which seems to work fine in PowerShell as well.
The regex they use that works in PowerShell is:
[^\x00-\x7F]+
Which will find any UTF8 Character, you can hone the regex if you need to be more specific.
My input only had the curly quote(s) as a UTF8 characters so this simple substitution worked:
Replace the UTF8 quote with standard single quote
$cq = $cq -replace "[^\x00-\x7F]+", "'"
来源:https://stackoverflow.com/questions/6968270/replacing-smart-quotes-in-powershell