How to Remove Special/Bad Characters from XML Using Powershell

好久不见. 提交于 2019-12-20 04:19:00

问题


I have an XML Files and I want to remove those Hexadecimal Characters Errors from the file below is the invalid characters:

I don't know what does STX means and when i tried copying it to my clipboard and paste it in MS Work it shows some other value.

How can I write a script in powershell to remove the above from my XML file.


回答1:


The following regex will remove any invalid characters from XML by specifying a character class negating the entire set of valid unicode entries in an XML document:

$rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"
$xmlText -replace $rPattern,''

This can easily be turned into a simple function:

function Repair-XmlString
{
  [CmdletBinding()]
  param(
    [Parameter(Mandatory=$true,Position=0)]
    [string]$inXML
  )

  # Match all characters that does NOT belong in an XML document
  $rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"

  # Replace said characters with [String]::Empty and return
  return [System.Text.RegularExpressions.Regex]::Replace($inXML,$rPattern,"")
}

Then do:

Repair-XmlString (Get-Content path\to\file.xml -Raw) |Set-Content path\to\file.xml 


来源:https://stackoverflow.com/questions/45706565/how-to-remove-special-bad-characters-from-xml-using-powershell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!