问题
Following is the xml file that I want to parse:
<?xml version="1.0" encoding="UTF-8"?>
<topic id="yerus5" xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/">
<title/>
<shortdesc/>
<body>
<p><b>CCU_CNT_ADDR: (Address=0x004 Reset=32'h1)</b><table id="table_r5b_1xj_ts">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<tbody>
<row>
<entry>Field</entry>
<entry>OFFSET</entry>
<entry>R/W Access</entry>
<entry>Description</entry>
</row>
<row>
<entry>reg2sm_cnt</entry>
<entry>15:0</entry>
<entry>R/W</entry>
<entry>Count Value to increment in the extenral memory at the specified location.
Default Value of 1. A Count value of 0 will clear the counter value</entry>
</row>
<row>
<entry>ccu2bus_endianess</entry>
<entry>24</entry>
<entry>R/W</entry>
<entry>Endianess of the data structure bit</entry>
</row>
<row>
<entry>ccu_lane_sel</entry>
<entry>25</entry>
<entry>R/W</entry>
<entry>ccu_lane_sel bit. Indicates the lane selection bit of the 32-bit location to
update</entry>
</row>
<row>
<entry>ccu_rdinvalid</entry>
<entry>26</entry>
<entry>R/W</entry>
<entry>ccu_rdinvalid bit. Indicates if the read value from the bus needs to be stored
or not.</entry>
</row>
</tbody>
</tgroup>
</table></p>
</body>
</topic>
After running following code:
#!/usr/bin/perl
# use module
use XML::Simple;
use Data::Dumper;
# create object
$xml = new XML::Simple(); #(KeyAttr=>[]);
# read XML file
$data = $xml->XMLin("test.xml");
# access XML data
print Dumper($data);
# dereference hash ref
# foreach $b (@{$p->{b}})
# {
# }
foreach $body (@{$data->{body}})
{
foreach $p (@{$body->{p}})
{
foreach $table (@{$p->{table}})
{
foreach $tgroup (@{$table->{tgroup}})
{
foreach $tbody (@{$tgroup->{tbody}})
{
foreach $row (@{$tbody->{row}})
{
foreach $entry ((@{$row->{entry}})->[3])
{
print $entry,"\n";
}
}
}
}
}
}
}
I am getting this error: Not an ARRAY reference at ppfe.pl line 28. (at foreach $body (@{$data->{body}})
)
I want to access each data of the <entry></entry>
. Above code only is only accessing the 'Description' column. How to do that?
With reference to above question,
I am not able to extract details particularly for each <b></b>
text. Following is sample output:
Name: CCU_CNT_ADDR: (Address=0x004 Reset=32'h1)
Field: reg2sm_cnt
OFFSET: 15:0
Access: R/W
Description: Count Value to increment in the extenral memory at the specified location. Default Value of 1. A Count value of 0 will clear the counter value
Filed: ccu2bus_endianess
OFFSET: 24
Access: R/W
Description: Endianess of the data structure bit
.
.
.
.
.
.
.
Name: CCU_STAT_ADDR: (Address=0x008 Reset=32'h0)
Field: fifo_cnt
.
.
.
.
.
.
.
回答1:
Don't use XML::Simple.
Even XML::Simple
says "don't use XML::Simple
".
The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.
Try instead something like this:
use strict;
use warnings;
use XML::Twig;
XML::Twig->new(
'twig_handlers' => {
'entry' => sub { print $_ ->text, "\n" }
}
)->parsefile ('your_file.xml');
This will print the text content of all the entry
elements, which appears to be what you're trying to do?
XML::Twig
has two really handy mechanisms - one of using a twig_handler
to find and print nodes matching a spec - this works 'as you go' which is particularly useful when handling large XML, or if you want to edit it before processing.
However, it also allows you to 'handle' data afterwards:
my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' )->parsefile('your_xml_file');
foreach my $element ( $twig -> get_xpath ("//entry") )
{
print $element ->text, "\n";
}
Or you could use a full path to the node as you're doing above:
$twig->root->get_xpath("body/p/table/tgroup/tbody/row/entry") )
In response to your question though:
Above code only is only accessing the 'Description' column. How to do that?
That's because you're doing this:
foreach $entry ((@{$row->{entry}})->[3])
E.g. trying to get the 4th element in the entry
array, which is Description
.
With reference to the comments - I'd suggest you convert your 'entries' into a hash outside the XML data structure.
Like this:
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
my @headers;
my $column_to_show = 'Field';
sub process_row {
my %entries;
my ( $twig, $row ) = @_;
my @row_entries = map { $_->text } $row->children;
if (@headers) {
@entries{@headers} = @row_entries;
print $column_to_show, " => ", $entries{$column_to_show}, "\n";
}
else {
@headers = @row_entries;
}
}
my $twig = XML::Twig->new(
'pretty_print' => 'indented_a',
twig_handlers => { 'row' => \&process_row }
)->parsefile ( 'your_file.xml' );
What this does is:
- fire that handler on each
row
element. - extract the
entry
subelements (and their text) into an array.@row_entries
. - Use the "header" row to turn that into a hash.
- Print the hash value that matches a specific key
$column_to_show
.
Depending on whether you're doing any more with the data than print it, you can turn that into a hash of arrays or similar.
Or you could just print $row_entries[3]
instead of course ;).
回答2:
It is always swifter and more accurate to use a proper XML parsing module that will allow you to access the XML data using XPath expressions
Here's a solution using [XML::Twig
][XML::Twig]
I wasn't sure what you meant about the bold fields in <b>
...</b>
as there is only one in the example data you show, but I've accessed that using the XPath //body/p/b
and printed it at the start of the output
The rest of the output is the values of the <entry>
elements in each <row>
which I access using //table/tgroup/tbody/row
. The contents of the first row are used as field names to label subsequent values
use strict;
use warnings;
use 5.010;
use open qw/ :std :encoding(UTF-8) /;
use XML::Twig;
use List::Util qw/ max /;
use List::MoreUtils qw/ pairwise /;
my $twig = XML::Twig->new;
$twig->parsefile('topic.xml');
say $twig->findvalues('//body/p/b');
say '';
my (@fields, $size);
for my $row ( $twig->findnodes('//table/tgroup/tbody/row') ) {
unless ( @fields ) {
@fields = map "$_:", $row->findvalues('entry');
$size = max map length, @fields;
next;
}
my @values = $row->findvalues('entry');
say for pairwise { sprintf '%-*s %s', $size, $a, $b } @fields, @values;
say '---';
}
output
CCU_CNT_ADDR: (Address=0x004 Reset=32'h1)
Field: reg2sm_cnt
OFFSET: 15:0
R/W Access: R/W
Description: Count Value to increment in the extenral memory at the specified location.
Default Value of 1. A Count value of 0 will clear the counter value
---
Field: ccu2bus_endianess
OFFSET: 24
R/W Access: R/W
Description: Endianess of the data structure bit
---
Field: ccu_lane_sel
OFFSET: 25
R/W Access: R/W
Description: ccu_lane_sel bit. Indicates the lane selection bit of the 32-bit location to
update
---
Field: ccu_rdinvalid
OFFSET: 26
R/W Access: R/W
Description: ccu_rdinvalid bit. Indicates if the read value from the bus needs to be stored
or not.
---
来源:https://stackoverflow.com/questions/31540145/in-perl-xmlsimple-is-not-able-to-dereference-multi-dimensional-associative-ar