Replacing “smart quotes” in powershell

六月ゝ 毕业季﹏ 提交于 2020-08-17 05:15:06

问题


I'm finding myself somewhat stumpped on a simple problem. I'm trying to remove fancy quoting from a bunch of text files. I've the following script, where I'm trying a number of different replacement methods, but w/o results.

Here's an example that downloads the data from github and attempts to convert.

$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"

$c = Get-Content "foo.txt"
$c | % { `
        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")       
    } `
    |  Set-Content "foo2.txt"

What's the trick for this to work?


回答1:


UPDATE: Fixed my answer (manojlds comments were correct, the $_ thing was a red herring). Here's a version that works, and I've updated it to incorporate your testing code:

    $srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
    $wc = New-Object net.WebClient
    $wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")

    $fancySingleQuotes = "[\u2019\u2018]" 
    $fancyDoubleQuotes = "[\u201C\u201D]" 

    $c = Get-Content "foo.txt" -Encoding UTF8

    $c | % { `
        $_ = [regex]::Replace($_, $fancySingleQuotes, "'")   
        [regex]::Replace($_, $fancyDoubleQuotes, '"')     
    } `
    |  Set-Content "foo2.txt"

The reason that manojlds version wasn't working for you is that the encoding on the file you're getting from github wasn't compatible with the Unicode characters in the regex. Reading it in as UTF-8 fixes the problem.




回答2:


The following works on the input and output that you had given:

    $c = Get-Content $file 
    $c | % { `

        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")
        } `
        |  Set-Content $file



回答3:


Your last replace, pleaces a left fancy quote with and single quote. Is that what you want? it doesn't match your sample output. Try this:

$_.Replace("`“","`"")
$_.Replace("`”","`"")



回答4:


This article is so close to what I need. I was looking for something that would check for any UTF8 and found this article: Notepad++, How to remove all non ascii characters with regex? Which seems to work fine in PowerShell as well.

The regex they use that works in PowerShell is:

[^\x00-\x7F]+

Which will find any UTF8 Character, you can hone the regex if you need to be more specific.

My input only had the curly quote(s) as a UTF8 characters so this simple substitution worked:

Replace the UTF8 quote with standard single quote

$cq = $cq -replace "[^\x00-\x7F]+", "'"



来源:https://stackoverflow.com/questions/6968270/replacing-smart-quotes-in-powershell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!