using perl XML::LibXML to parse

坚强是说给别人听的谎言 提交于 2019-12-12 12:24:39

问题


I am using perl's XML::LibXML module to parse an XML response from a device. It appears that the only way I can successfully get my data is by modifying the XML response from the device. Here is my XML response from the device:

<chassis-inventory xmlns="http://xml.juniper.net/junos/10.3D0/junos-chassis">

<chassis junosstyle="inventory">

<name>Chassis</name>

<serial-number>JN111863EAFF</serial-number>

<description>VJX1000</description>

<chassis-module>

<name>Midplane</name>

</chassis-module>

<chassis-module>

<name>System IO</name>

</chassis-module>

<chassis-module>

<name>Routing Engine</name>

<description>VJX1000</description>

<chassis-re-disk-module>

<name>ad0</name>

<disk-size>1953</disk-size>

<model>QEMU HARDDISK</model>

<serial-number>QM00001</serial-number>

<description>Hard Disk</description>

</chassis-re-disk-module>

</chassis-module>

<chassis-module>

<name>FPC 0</name>

<chassis-sub-module>

<name>PIC 0</name>

</chassis-sub-module>

</chassis-module>

<chassis-module>

<name>Power Supply 0</name>

</chassis-module>

</chassis>

</chassis-inventory>

Here is the perl code I am using to parse and find the serial number for example:

#!/bin/env perl
use strict;
use warnings;
use XML::LibXML;
my $f = ("/var/working/xmlstuff");
sub yeah {
my $ff;
my $f = shift;
open(my $fff,$f);
while(<$fff>) {
$_ =~ s/^\s+$//; 
$_ =~ s/^(<\S+)\s.*?=.*?((?:\/)?>)/$1$2/g;
$ff .= $_;
}
close($fff);
return $ff
}
my $tparse = XML::LibXML->new();
my $ss = $tparse->load_xml( string => &yeah($f));
print map $_->to_literal,$ss->findnodes('/chassis-inventory/chassis/serial-number');

If I do not use the regex substitution nothing is loaded for the script to parse. I can understand the stripping of newlines, but why do I have to remove the attributes from the XML response, so it only works if these lines:

<chassis-inventory xmlns="http://xml.juniper.net/junos/10.3D0/junos-chassis">

<chassis junosstyle="inventory">

Become this:

<chassis-inventory>
<chassis>
  1. Is this a problem with the XML response or with the XML::LibXML module?

  2. Is there a way to have it ignore the fact that there is empty lines in the file without using a regex substitution?

Thanks for the help.


回答1:


The reason your XPATH expression is failing is because of the namespace; you need to search in context to that. Here's an explanation from the XML::libXML documentation:

NOTE ON NAMESPACES AND XPATH:

A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace.

So, for example, one cannot match the root element of an XHTML document with $node->find('/html') since '/html' would only match if the root element had no namespace, but all XHTML elements belong to the namespace http://www.w3.org/1999/xhtml. (Note that xmlns="..." namespace declarations can also be specified in a DTD, which makes the situation even worse, since the XML document looks as if there was no default namespace).

To deal with this, register the namespace, then search your document using the namespace. Here's an example that should work for you:

#!/bin/env perl
use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->load_xml( location => '/var/working/xmlstuff');
my $xpc = XML::LibXML::XPathContext->new($xml);
$xpc->registerNs('x', 'http://xml.juniper.net/junos/10.3D0/junos-chassis');

foreach my $node ($xpc->findnodes('/x:chassis-inventory/x:chassis/x:serial-number')) {

    print $node->textContent() . "\n";
}


来源:https://stackoverflow.com/questions/7041719/using-perl-xmllibxml-to-parse

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!