PowerShell out-file: prevent encoding changes

六眼飞鱼酱① 提交于 2019-11-28 19:16:19
Andy Arismendi

Out-File has a default encoding unless overriden with the -Encoding parameter:

What I've done to solve this is to try to get the original file's encoding by reading trying to read it's byte order mark and using it as the-Encoding parameter value.

Here's an example processing a bunch of text file paths, getting the original encoding, processing the content and writing it back to file with the original's encoding.

function Get-FileEncoding {
    param ( [string] $FilePath )

    [byte[]] $byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $FilePath

    if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
        { $encoding = 'UTF8' }  
    elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
        { $encoding = 'BigEndianUnicode' }
    elseif ($byte[0] -eq 0xff -and $byte[1] -eq 0xfe)
         { $encoding = 'Unicode' }
    elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
        { $encoding = 'UTF32' }
    elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)
        { $encoding = 'UTF7'}
        { $encoding = 'ASCII' }
    return $encoding

foreach ($textFile in $textFiles) {
    $encoding = Get-FileEncoding $textFile
    $content = Get-Content -Encoding $encoding
    # Process content here...
    $content | Set-Content -Path $textFile -Encoding $encoding

Update Here is an example of getting the original file encoding using the StreamReader class. The example reads the first 3 bytes of the file so that the CurrentEncoding property gets set based on the result of its internal BOM detection routine.


The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the UTF8Encoding is used. See the Encoding.GetPreamble method for more information.


$text = @" 
This is
my text file

#Create text file.
[IO.File]::WriteAllText($filePath, $text, [System.Text.Encoding]::BigEndianUnicode)

#Create a stream reader to get the file's encoding and contents.
$sr = New-Object System.IO.StreamReader($filePath, $true)
[char[]] $buffer = new-object char[] 3
$sr.Read($buffer, 0, 3)  
$encoding = $sr.CurrentEncoding

#Show the detected encoding.

#Update the file contents.
$content = [IO.File]::ReadAllText($filePath, $encoding)
$content2 = $content -replace "my" , "your"

#Save the updated contents to file.
[IO.File]::WriteAllText($filePath, $content2, $encoding)

#Display the result.
Get-Content $filePath