问题
I've been googling for some time now in order to find information concerning the usage of a Perl-XML-Parser. Being quite a newbie, though, I couldn’t fully understand the documentation or the tutorials.
Just a few words about what I’d need the parser for (nothing exceptional, as you'll see):
I would like to read in an XML-file and transform it — in a first step — into a LaTeX-document. In a second step, I would like to extract certain pieces of information.
For example:
<body>
<head>Title</head>
<poem>
<l>xyz</l>
<l>xyz</l>
</poem>
</body>
This sample-"XML" should be transformed in something like:
\begin{document}
\chapter{Title}
\begin{verse}
xyz\\
xyz
\end{verse}
\end{document}
Furthermore, I would like to put certain pieces of information (e.g. the text between the <l>...</l>-tags) into an array/hash (perhaps together with the number of preceding </l>s)?.
I suppose, tasks like these can very easily be done with a parser. The problem is that I have got only a very vague idea of how to initialize and customize for ex. the XML::Parser module.
I'd be very thankful if anyone could help.
回答1:
Another possibility to handle XML in Perl is XML::XSH2:
use XML::XSH2;
xsh << 'end_xsh';
open 8023786.xml ;
cd body ;
echo '\begin{document}' ;
for poem {
echo :s '\chapter{' preceding-sibling::head[1] '}' ;
echo '\begin{verse}' ;
for l echo :s text() xsh:if(following-sibling::*, '\\', '') ;
echo '\end{verse}' ;
}
echo '\end{document}' ;
end_xsh
回答2:
The "best" way to transform XML into Latex would be to use XSLT.
STRONG SUGGESTION:
1) Familiarize yourself with basic Perl XML.
Alternatively, use a different language if you feel more comfortable with something else besides Perl - there are good XML libraries available for most languages.
I'd strongly recommend working through all three chapters in this tutorial:
XML For Perl Developers
2) Familiarize yourself with the basics of using XSLT stylesheets. For example:
Investigating XSLT: The XML Transformation Language
3) Investigate some ready-made XML to Latex XSL stylesheets. For example:
XML to LaTeX
... or ...
Transforming XHTML to LaTeX
... or ...
XSLT MathML Library
PS: I hasten to add that the XSLT approach is language- and platform-agnostic. You can use this approach in any language (Perl, Java, Python, etc etc) and on any platform (Windows, Linux, MacOS, etc etc)
回答3:
For complete control over XML translation, implement a finite-state machine using SAX. Perl has XML::SAX with different parser backends (XML::SAX::ExpatXS, XML::LibXML::SAX). Here is one possible solution:
#!/usr/bin/env perl
package XML::SAX::Handler::XML2LaTeX;
use feature qw(say switch);
use strict;
use warnings qw(all);
use base qw(XML::SAX::Base);
sub new {
return bless {
data => '',
line => [],
} => __PACKAGE__;
}
sub start_element {
my ($self, $el) = @_;
$self->{data} = '';
for ($el->{Name}) {
when ('body') {
say '\begin{document}';
} when ('poem') {
say '\begin{verse}';
$self->{line} = [];
}
}
return;
}
sub end_element {
my ($self, $el) = @_;
my $data = $self->{data};
for ($el->{Name}) {
when ('body') {
say '\end{document}';
} when ('head') {
say "\\chapter{$data}";
} when ('poem') {
say join "\\\\\n", @{$self->{line}};
say '\end{verse}';
} when ('l') {
push @{$self->{line}}, $data;
}
}
return;
}
sub characters {
my ($self, $data) = @_;
$self->{data} .= $data->{Data};
return;
}
1;
package main;
use strict;
use warnings qw(all);
use XML::SAX::PurePerl;
my $handler = XML::SAX::Handler::XML2LaTeX->new;
my $parser = XML::SAX::PurePerl->new(Handler => $handler);
$parser->parse_file(\*DATA);
__DATA__
<body>
<head>Title</head>
<poem>
<l>xyz</l>
<l>xyz</l>
</poem>
</body>
来源:https://stackoverflow.com/questions/8023786/getting-started-with-xmlparser