Converting Unicode string to ASCII

后端 未结 2 1436
忘掉有多难
忘掉有多难 2020-12-19 20:14

I have strings containing characters which are not found in ASCII; such as á, é, í, ó, ú; and I need a function to convert them into something acceptable such as a, e, i, o,

相关标签:
2条回答
  • 2020-12-19 21:01
    function Convert-DiacriticCharacters {
        param(
            [string]$inputString
        )
        [string]$formD = $inputString.Normalize(
                [System.text.NormalizationForm]::FormD
        )
        $stringBuilder = new-object System.Text.StringBuilder
        for ($i = 0; $i -lt $formD.Length; $i++){
            $unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
            $nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
            if($unicodeCategory -ne $nonSPacingMark){
                $stringBuilder.Append($formD[$i]) | out-null
            }
        }
        $stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
    }
    

    The resulting function will convert diacritics in the follwoing way:

    PS C:\> Convert-DiacriticCharacters "Ångström"
    Angstrom
    PS C:\> Convert-DiacriticCharacters "Ó señor"
    O senor
    

    Copied from: http://cosmoskey.blogspot.nl/2009/09/powershell-function-convert.html

    0 讨论(0)
  • 2020-12-19 21:14

    Taking this answer from a C#/.Net question it seems to work in PowerShell ported roughly like this:

    function Remove-Diacritics
    {
        Param([string]$Text)
    
    
        $chars = $Text.Normalize([System.Text.NormalizationForm]::FormD).GetEnumerator().Where{ 
    
            [System.Char]::GetUnicodeCategory($_) -ne [System.Globalization.UnicodeCategory]::NonSpacingMark
    
        }
    
    
        (-join $chars).Normalize([System.Text.NormalizationForm]::FormC)
    
    }
    

    e.g.

    PS C:\> Remove-Diacritics 'abcdeéfg'
    abcdeefg
    
    0 讨论(0)
提交回复
热议问题