Read UTF-8 files correctly with PowerShell

后端 未结 3 660
慢半拍i
慢半拍i 2020-12-29 09:58

Following situation:

  • A PowerShell script creates a file with UTF-8 encoding
  • The user may or may not edit the file, possibly losing the BOM, but should
相关标签:
3条回答
  • 2020-12-29 10:14

    If the file is supposed to be UTF8 why don't you try to read it decoding UTF8 :

    Get-Content -Path test.txt -Encoding UTF8
    
    0 讨论(0)
  • 2020-12-29 10:15

    Really JPBlanc is right. If you want it read as UTF8 then specify that when the file is read.

    On a side note, you're losing formatting in here with the [String]+[String] stuff. Not to mention your regex match doesn't work. Check out the regex search changes, and the changes made to the $newMsgs, and the way I'm outputting your data to the file.

    # Read data if exists
    $data = ""
    $startRev = 1;
    if (Test-Path test.txt)
    {
        $data = Get-Content -Path test.txt #-Encoding UTF8
        if($data -match "\br([0-9]+)\b"){
            $startRev = [int]([regex]::Match($data,"\br([0-9]+)\b")).groups[1].value + 1
        }
    }
    Write-Host Next revision is $startRev
    
    # Define example data to add
    $startRev = $startRev + 10
    $newMsgs = @"
    2014-04-01 - r$startRev`r`n`r`n
        Line 1`r`n
        Line 2`r`n`r`n
    "@
    
    # Write new data back
    $newmsgs,$data | Out-File test.txt -Encoding UTF8
    
    0 讨论(0)
  • 2020-12-29 10:17

    Get-Content doesn't seem to handle UTF-files without BOM at all (if you omit the Encoding-flag). System.IO.File.ReadLines seems to be an alternative, examples:

    PS C:\temp\powershellutf8> $a = Get-Content .\utf8wobom.txt
    PS C:\temp\powershellutf8> $b = Get-Content .\utf8wbom.txt
    PS C:\temp\powershellutf8> $a2 = Get-Content .\utf8wbom.txt -Encoding UTF8
    PS C:\temp\powershellutf8> $a
    ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ  <== This doesnt seem to be right at all
    PS C:\temp\powershellutf8> $b
    ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ
    PS C:\temp\powershellutf8> $a2
    ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ
    PS C:\temp\powershellutf8>
    PS C:\temp\powershellutf8> $c = [IO.File]::ReadLines('.\utf8wbom.txt');
    PS C:\temp\powershellutf8> $c
    ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ
    PS C:\temp\powershellutf8> $d = [IO.File]::ReadLines('.\utf8wobom.txt');
    PS C:\temp\powershellutf8> $d
    ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ <== Works!
    
    0 讨论(0)
提交回复
热议问题