Trying to import XML children from one file to another

為{幸葍}努か 提交于 2019-12-01 09:21:33
Ansgar Wiechers

As mklement0 pointed out, your XML documents have namespaces, so you need a namespace manager when selecting nodes with XPath expressions. Using dot-access for selecting the nodes gets you around namespace management, but since dot-access doesn't always work the way one might expect I'd still recommend sticking with SelectNodes() and using proper namespace managers.

$uri = 'http://www.microsoft.com/schemas/PowerShell/Plaster/v1'

[xml]$ManifestFile = Get-Content 'C:\path\to\old.xml'
$nm1 = New-Object Xml.XmlNamespaceManager $ManifestFile.NameTable
$nm1.AddNamespace('ns1', $uri)

[xml]$NewManifestFile = Get-Content 'C:\path\to\new.xml'
$nm2 = New-Object Xml.XmlNamespaceManager $NewManifestFile.NameTable
$nm2.AddNamespace('ns2', $uri)

$ManifestFile.SelectNodes('//ns1:parameter', $nm1) | ForEach-Object {
    $newnode = $NewManifestFile.ImportNode($_, $true)
    $parent  = $NewManifestFile.SelectSingleNode('//ns2:parameters', $nm2)
    $parent.AppendChild($newnode) | Out-Null
}

$NewManifestFile.Save('C:\path\to\new.xml')

There are several problems with your current approach:

  • You're not importing the elements from the source document into the destination document, even though that is a prerequisite for inserting it into the destination document's DOM.

  • You're using .SelectSingleNode() to select the source-document nodes, even though - I presume - you meant to use .SelectNodes() to select all <parameter> elements.

  • You're missing namespace management for the documents, which is a prerequisite for successful XPath queries via .SelectSingleNode() / .SelectNodes().

    • However, given that namespace management is complex, the solution below employs workarounds. If you do want to deal with namespaces - which is the proper way to do it - see Ansgar Wiechers' helpful answer.

Here's a fixed, annotated solution:

$ManifestFile = [xml](Get-Content -Raw ./PlasterManifest.xml)
$NewManifestFile = [xml](Get-Content -Raw $PlasterMetadata.Path)

# Get the <parameters> element in the *source* doc.
# Note that PowerShell's dot notation-based access to the DOM does
# NOT require namespace management.
$ParametersRoot = $ManifestFile.plasterManifest.parameters

# Get the parent of the <parameter> elements in the *destination* doc.
# Note: Ideally we'd also use dot notation in order to avoid the need for namespace
#       management, but since the target <parameters> element is *empty*, 
#       PowerShell represents it as a *string* rather than as an XML element.
#       Instead, we use .GetElementsByTagName() to locate the element and rely
#       on the knowledge that there is only *one* in the whole document.
$NewParametersRoot = $NewManifestFile.GetElementsByTagName('parameters')[0]

# Import the source element's subtree into the destination document, so it can
# be inserted into the DOM later.
$ImportedParametersRoot = $NewManifestFile.ImportNode($ParametersRoot, $True)

# For simplicity, replace the entire <parameters> element, which
# obviates the need for a loop.
# Note the need to call .ReplaceChild() on the .documentElement property,
# not on the document object itself.
$null = $NewManifestFile.documentelement.ReplaceChild($ImportedParametersRoot, $NewParametersRoot)

# Save the modified destination document.
$NewManifestFile.Save($PlasterMetadata.Path)

Optional background information:

  • The .SelectSingleNode() / .SelectNodes() methods, because they accept XPath queries, are the most flexible and powerful methods for locating elements (nodes) of interest in an XML document, but they do require explicit namespace handling if the input document declares namespaces (such as xmlns="http://www.microsoft.com/schemas/PowerShell/Plaster/v1" in your case):

    • Note: If a given input document declares namespace and you neglect to handle them as described below, .SelectSingleNode() / .SelectNodes() simply return $null for all queries, if unqualified element names are used (e.g., parameters) and fails with namespace-qualified (namespace-prefixed) ones (e.g., plaster:parameters).

    • Namespace handling involves these steps (note that a given document may have multiple namespace declarations, but for simplicity the instructions assume only one):

      • Instantiate a namespace manager and associate it with the input document['s name table].

      • Associate the namespace's URI with a symbolic identifier. If the namespace declaration in the input document is for the default namespace - xmlns - you cannot use that as your symbolic identifier (the name xmlns is reserved) and must simply choose one.

      • Then, when you call .SelectSingleNode() / .SelectNodes(), you must use this symbolic identifier as an element-name prefix in your query strings; e.g., if your (self-chosen) symbolic identifer is plaster and you're looking for element parameters anywhere in the document, you'd use query string '//plaster:pararameters'

      • Ansgar Wiechers' helpful answer demonstrates all that.

    • Consider PowerShell's Select-Xml cmdlet as an alternative: as a high-level wrapper around .SelectNodes() it too supports XPath queries, but makes namespace management easier - see the bottom section of this answer.

  • By contrast, PowerShell's dot notation is always namespace-agnostic and the .GetElementByTagNames() method can be, so they require no explicit namespace handling.

    • Caveat: While this reduces complexity, you should only use it if you know that proper namespace handling is not a necessity for processing the input document correctly.

    • PowerShell's dot notation:

      • PowerShell conveniently maps the XML document's DOM - the hierarchy of nodes in the input document - onto a nested object with properties, allowing you to drill down into the document with regular dot notation; e.g., the equivalent of XPath query '/root/elem' would be $xmlDoc.root.elem
        However, this implies that you can only use this notation to access elements whose path in the hierarchy you already know - queries are not supported (though an XPath-enabled Select-Xml cmdlet exits).

      • This mapping ignores namespace qualifiers (prefixes), so you must use the mere element name, without any namespace prefix; e.g., if the input document has a plaster:parameters element, you must refer to it as just parameters.

      • As convenient as dot notation is, it comes with pitfalls, the most notable of which is that quasi-leaf elements - those that either have no child nodes at all or only non-element child nodes such as a text node - are returned as strings, not elements, which makes it difficult to modify them.
        In short: the mapping between the XML DOM and PowerShell's object model isn't - and cannot be - exact.

    • .GetElementsByTagName() method:

      • Returns a collection of all elements with the specified tag name, in the entire document, across all levels of the hierarchy (even when invoked from an interior node).
        As such, it doesn't allow for sophisticated selection of target elements, and the documentation recommends using .SelectSingleNode() / .SelectNodes() instead.

      • While you can pass a namespace URI as the second argument, it isn't required; if you don't, you must specify the element (tag) name literally, exactly as it occurs in the document, including its namespace qualifier, if present.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!