PowerShell out-file: prevent encoding changes

后端 未结 1 1747
迷失自我
迷失自我 2020-12-08 14:19

I\'m currently working on some search and replace operation that I\'m trying to automate using powershell. Unfortunately I recognized yesterday that we\'ve different file en

相关标签:
1条回答
  • 2020-12-08 14:53

    Out-File has a default encoding unless overriden with the -Encoding parameter:

    What I've done to solve this is to try to get the original file's encoding by reading trying to read it's byte order mark and using it as the-Encoding parameter value.

    Here's an example processing a bunch of text file paths, getting the original encoding, processing the content and writing it back to file with the original's encoding.

    function Get-FileEncoding {
        param ( [string] $FilePath )
    
        [byte[]] $byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $FilePath
    
        if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
            { $encoding = 'UTF8' }  
        elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
            { $encoding = 'BigEndianUnicode' }
        elseif ($byte[0] -eq 0xff -and $byte[1] -eq 0xfe)
             { $encoding = 'Unicode' }
        elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
            { $encoding = 'UTF32' }
        elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)
            { $encoding = 'UTF7'}
        else
            { $encoding = 'ASCII' }
        return $encoding
    }
    
    foreach ($textFile in $textFiles) {
        $encoding = Get-FileEncoding $textFile
        $content = Get-Content -Encoding $encoding
        # Process content here...
        $content | Set-Content -Path $textFile -Encoding $encoding
    }
    

    Update Here is an example of getting the original file encoding using the StreamReader class. The example reads the first 3 bytes of the file so that the CurrentEncoding property gets set based on the result of its internal BOM detection routine.

    http://msdn.microsoft.com/en-us/library/9y86s1a9.aspx

    The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the UTF8Encoding is used. See the Encoding.GetPreamble method for more information.

    http://msdn.microsoft.com/en-us/library/system.text.encoding.getpreamble.aspx

    $text = @" 
    This is
    my text file
    contents.
    "@
    
    #Create text file.
    [IO.File]::WriteAllText($filePath, $text, [System.Text.Encoding]::BigEndianUnicode)
    
    #Create a stream reader to get the file's encoding and contents.
    $sr = New-Object System.IO.StreamReader($filePath, $true)
    [char[]] $buffer = new-object char[] 3
    $sr.Read($buffer, 0, 3)  
    $encoding = $sr.CurrentEncoding
    $sr.Close()
    
    #Show the detected encoding.
    $encoding
    
    #Update the file contents.
    $content = [IO.File]::ReadAllText($filePath, $encoding)
    $content2 = $content -replace "my" , "your"
    
    #Save the updated contents to file.
    [IO.File]::WriteAllText($filePath, $content2, $encoding)
    
    #Display the result.
    Get-Content $filePath
    
    0 讨论(0)
提交回复
热议问题