Using PowerShell to write a file in UTF-8 without the BOM

你离开我真会死。 提交于 2019-11-25 23:25:16

问题


Out-File seems to force the BOM when using UTF-8:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding \"UTF8\" $MyPath

How can I write a file in UTF-8 with no BOM using PowerShell?


回答1:


Using .NET's UTF8Encoding class and passing $False to the constructor seems to work:

$MyFile = Get-Content $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)



回答2:


The proper way as of now is to use a solution recommended by @Roman Kuzmin in comments to @M. Dudley answer:

[IO.File]::WriteAllLines($filename, $content)

(I've also shortened it a bit by stripping unnecessary System namespace clarification - it will be substituted automatically by default.)




回答3:


I figured this wouldn't be UTF, but I just found a pretty simple solution that seems to work...

Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext

For me this results in a utf-8 without bom file regardless of the source format.




回答4:


Note: This answer applies to Windows PowerShell; by contrast, in the cross-platform PowerShell Core edition, UTF-8 without BOM is the default encoding.

To complement M. Dudley's own simple and pragmatic answer (and ForNeVeR's more concise reformulation):

For convenience, here's advanced function Out-FileUtf8NoBom, a pipeline-based alternative that mimics Out-File, which means:

  • you can use it just like Out-File in a pipeline.
  • input objects that aren't strings are formatted as they would be if you sent them to the console, just like with Out-File.

Example:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath

Note how (Get-Content $MyPath) is enclosed in (...), which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.

A note on memory use:

  • M. Dudley's own answer requires that the entire file contents be built up in memory first, which can be problematic with large files.
  • The function below improves on this only slightly: all input objects are still buffered first, but their string representations are then generated and written to the output file one by one.

Source code of Out-FileUtf8NoBom (also available as an MIT-licensed Gist):

<#
.SYNOPSIS
  Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).

.DESCRIPTION
  Mimics the most important aspects of Out-File:
  * Input objects are sent to Out-String first.
  * -Append allows you to append to an existing file, -NoClobber prevents
    overwriting of an existing file.
  * -Width allows you to specify the line width for the text representations
     of input objects that aren't strings.
  However, it is not a complete implementation of all Out-String parameters:
  * Only a literal output path is supported, and only as a parameter.
  * -Force is not supported.

  Caveat: *All* pipeline input is buffered before writing output starts,
          but the string representations are generated and written to the target
          file one by one.

.NOTES
  The raison d'être for this advanced function is that, as of PowerShell v5,
  Out-File still lacks the ability to write UTF-8 files without a BOM:
  using -Encoding UTF8 invariably prepends a BOM.

#>
function Out-FileUtf8NoBom {

  [CmdletBinding()]
  param(
    [Parameter(Mandatory, Position=0)] [string] $LiteralPath,
    [switch] $Append,
    [switch] $NoClobber,
    [AllowNull()] [int] $Width,
    [Parameter(ValueFromPipeline)] $InputObject
  )

  #requires -version 3

  # Make sure that the .NET framework sees the same working dir. as PS
  # and resolve the input path to a full path.
  [System.IO.Directory]::SetCurrentDirectory($PWD) # Caveat: .NET Core doesn't support [Environment]::CurrentDirectory
  $LiteralPath = [IO.Path]::GetFullPath($LiteralPath)

  # If -NoClobber was specified, throw an exception if the target file already
  # exists.
  if ($NoClobber -and (Test-Path $LiteralPath)) {
    Throw [IO.IOException] "The file '$LiteralPath' already exists."
  }

  # Create a StreamWriter object.
  # Note that we take advantage of the fact that the StreamWriter class by default:
  # - uses UTF-8 encoding
  # - without a BOM.
  $sw = New-Object IO.StreamWriter $LiteralPath, $Append

  $htOutStringArgs = @{}
  if ($Width) {
    $htOutStringArgs += @{ Width = $Width }
  }

  # Note: By not using begin / process / end blocks, we're effectively running
  #       in the end block, which means that all pipeline input has already
  #       been collected in automatic variable $Input.
  #       We must use this approach, because using | Out-String individually
  #       in each iteration of a process block would format each input object
  #       with an indvidual header.
  try {
    $Input | Out-String -Stream @htOutStringArgs | % { $sw.WriteLine($_) }
  } finally {
    $sw.Dispose()
  }

}



回答5:


Starting from version 6 powershell supports the UTF8NoBOM encoding both for set-content and out-file and even uses this as default encoding.

So in the above example it should simply be like this:

$MyFile | Out-File -Encoding UTF8NoBOM $MyPath



回答6:


When using Set-Content instead of Out-File, you can specify the encoding Byte, which can be used to write a byte array to a file. This in combination with a custom UTF8 encoding which does not emit the BOM gives the desired result:

# This variable can be reused
$utf8 = New-Object System.Text.UTF8Encoding $false

$MyFile = Get-Content $MyPath -Raw
Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath

The difference to using [IO.File]::WriteAllLines() or similar is that it should work fine with any type of item and path, not only actual file paths.




回答7:


This script will convert, to UTF-8 without BOM, all .txt files in DIRECTORY1 and output them to DIRECTORY2

foreach ($i in ls -name DIRECTORY1\*.txt)
{
    $file_content = Get-Content "DIRECTORY1\$i";
    [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content);
}



回答8:


    [System.IO.FileInfo] $file = Get-Item -Path $FilePath 
    $sequenceBOM = New-Object System.Byte[] 3 
    $reader = $file.OpenRead() 
    $bytesRead = $reader.Read($sequenceBOM, 0, 3) 
    $reader.Dispose() 
    #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 
    if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) 
    { 
        $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) 
        [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) 
        Write-Host "Remove UTF-8 BOM successfully" 
    } 
    Else 
    { 
        Write-Warning "Not UTF-8 BOM file" 
    }  

Source How to remove UTF8 Byte Order Mark (BOM) from a file using PowerShell




回答9:


If you want to use [System.IO.File]::WriteAllLines(), you should cast second parameter to String[] (if the type of $MyFile is Object[]), and also specify absolute path with $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), like:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
[System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)

If you want to use [System.IO.File]::WriteAllText(), sometimes you should pipe the second parameter into | Out-String | to add CRLFs to the end of each line explictly (Especially when you use them with ConvertTo-Csv):

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
[System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)

Or you can use [Text.Encoding]::UTF8.GetBytes() with Set-Content -Encoding Byte:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"

see: How to write result of ConvertTo-Csv to a file in UTF-8 without BOM




回答10:


One technique I utilize is to redirect output to an ASCII file using the Out-File cmdlet.

For example, I often run SQL scripts that create another SQL script to execute in Oracle. With simple redirection (">"), the output will be in UTF-16 which is not recognized by SQLPlus. To work around this:

sqlplus -s / as sysdba "@create_sql_script.sql" |
Out-File -FilePath new_script.sql -Encoding ASCII -Force

The generated script can then be executed via another SQLPlus session without any Unicode worries:

sqlplus / as sysdba "@new_script.sql" |
tee new_script.log



回答11:


Change multiple files by extension to UTF-8 without BOM:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach($i in ls -recurse -filter "*.java") {
    $MyFile = Get-Content $i.fullname 
    [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding)
}



回答12:


For whatever reason, the WriteAllLines calls were still producing a BOM for me, with the BOMless UTF8Encoding argument and without it. But the following worked for me:

$bytes = gc -Encoding byte BOMthetorpedoes.txt
[IO.File]::WriteAllBytes("$(pwd)\BOMthetorpedoes.txt", $bytes[3..($bytes.length-1)])

I had to make the file path absolute for it to work. Otherwise it wrote the file to my Desktop. Also, I suppose this only works if you know your BOM is 3 bytes. I have no idea how reliable it is to expect a given BOM format/length based on encoding.

Also, as written, this probably only works if your file fits into a powershell array, which seems to have a length limit of some value lower than [int32]::MaxValue on my machine.




回答13:


This one works for me (use "Default" instead of "UTF8"):

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "Default" $MyPath

The result is ASCII without BOM.




回答14:


Could use below to get UTF8 without BOM

$MyFile | Out-File -Encoding ASCII



回答15:


Had the same issue. That did the trick for me:

$MyFile | Out-File -Encoding Oem $MyPath

While opening the file with Visual Studio Code or Notepad++ it shows as UTF-8

Finnaly this just "appear" to work. When open with some editor it shows as UTF-8 no BOM. But this is not true at all. Use the solution at the top of the tread. This one work for real



来源:https://stackoverflow.com/questions/5596982/using-powershell-to-write-a-file-in-utf-8-without-the-bom

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!