问题
I have an XML Files and I want to remove those Hexadecimal Characters Errors from the file below is the invalid characters:
I don't know what does STX means and when i tried copying it to my clipboard and paste it in MS Work it shows some other value.
How can I write a script in powershell to remove the above from my XML file.
回答1:
The following regex will remove any invalid characters from XML by specifying a character class negating the entire set of valid unicode entries in an XML document:
$rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"
$xmlText -replace $rPattern,''
This can easily be turned into a simple function:
function Repair-XmlString
{
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$inXML
)
# Match all characters that does NOT belong in an XML document
$rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"
# Replace said characters with [String]::Empty and return
return [System.Text.RegularExpressions.Regex]::Replace($inXML,$rPattern,"")
}
Then do:
Repair-XmlString (Get-Content path\to\file.xml -Raw) |Set-Content path\to\file.xml
来源:https://stackoverflow.com/questions/45706565/how-to-remove-special-bad-characters-from-xml-using-powershell