Remove XML namespaces with XML::LibXML

眉间皱痕 提交于 2019-12-24 01:18:56

问题


I'm converting an XML document into HTML. One of the things that needs to happen is the removal of namespaces, which cannot be legally declared in HTML (unless it's the XHTML namespace in the root tag). I have found posts from 5-10 years ago about how difficult this is to do with XML::LibXML and LibXML2, but not as much recently. Here's an example:

use XML::LibXML;
use XML::LibXML::XPathContext;
use feature 'say';

my $xml = <<'__EOI__';
<myDoc>
  <par xmlns:bar="www.bar.com">
    <bar:foo/>
  </par>
</myDoc>
__EOI__

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);

my $bar_foo = do{
    my $xpc = XML::LibXML::XPathContext->new($doc);
    $xpc->registerNs('bar', 'www.bar.com');
    ${ $xpc->findnodes('//bar:foo') }[0];
};
$bar_foo->setNodeName('foo');
$bar_foo->setNamespace('','');
say $bar_foo->nodeName; #prints 'bar:foo'. Dang!

my @namespaces = $doc->findnodes('//namespace::*');
for my $ns (@namespaces){
    # $ns->delete; #can't find any such method for namespaces
}
say $doc->toStringHTML;

In this code I tried a few things that didn't work. First I tried setting the name of the bar:foo element to an unprefixed foo (the documentation says that that method is aware of namespaces, but apparently not). Then I tried setting the element namespace to null, and that didn't work either. Finally, I looked through the docs for a method for deleting namespaces. No such luck. The final output string still has everything I want to remove (namespace declarations and prefixes).

Does anyone have a way to remove namespaces, setting elements and attributes to the null namespace?


回答1:


Here's my own gymnasticsy answer. If there is no better way, it will do. I sure wish there were a better way...

The replace_without_ns method just copies nodes without the namespace. Any children elements that need the namespace get the declaration on them, instead. The code below moves the entire document into the null namespace:

use strict;
use warnings;
use XML::LibXML;

my $xml = <<'__EOI__';
<myDoc xmlns="foo">
  <par xmlns:bar="www.bar.com" foo="bar">
    <bar:foo stuff="junk">
      <baz bar:thing="stuff"/>
      fooey
      <boof/>
    </bar:foo>
  </par>
</myDoc>
__EOI__

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);

# remove namespaces for the whole document
for my $el($doc->findnodes('//*')){
    if($el->getNamespaces){
        replace_without_ns($el);
    }
}

# replaces the given element with an identical one without the namespace
# also does this with attributes
sub replace_without_ns {
    my ($el) = @_;
    # new element has same name, minus namespace
    my $new = XML::LibXML::Element->new( $el->localname );
    #copy attributes (minus namespace namespace)
    for my $att($el->attributes){
        if($att->nodeName !~ /xmlns(?::|$)/){
            $new->setAttribute($att->localname, $att->value);
        }
    }
    #move children
    for my $child($el->childNodes){
        $new->appendChild($child);
    }

    # if working with the root element, we have to set the new element
    # to be the new root
    my $doc = $el->ownerDocument;
    if( $el->isSameNode($doc->documentElement) ){
        $doc->setDocumentElement($new);
        return;
    }
    #otherwise just paste the new element in place of the old element
    $el->parentNode->insertAfter($new, $el);
    $el->unbindNode;
    return;
}

print $doc->toStringHTML;



回答2:


Here's a simple solution using an XSLT stylesheet:

use strict;
use warnings;
use XML::LibXML;
use XML::LibXSLT;

my $xml = <<'__EOI__';
<myDoc xmlns="foo">
  <par xmlns:bar="www.bar.com" foo="bar">
    <bar:foo stuff="junk">
      <baz bar:thing="stuff"/>
      fooey
      <boof/>
    </bar:foo>
  </par>
</myDoc>
__EOI__

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);

my $xslt    = XML::LibXSLT->new();
my $xsl_doc = $parser->parse_string(<<'XSL');
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="*">
    <xsl:element name="{local-name()}">
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>
  <xsl:template match="@*">
    <xsl:attribute name="{local-name()}">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>
XSL

my $stylesheet = $xslt->parse_stylesheet($xsl_doc);
my $result     = $stylesheet->transform($doc);
print $stylesheet->output_as_bytes($result);

Note that if you want to copy comments or processing instructions, further adjustments are needed.



来源:https://stackoverflow.com/questions/17756926/remove-xml-namespaces-with-xmllibxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!