What's the best XML parser for Perl?

大城市里の小女人 提交于 2019-11-27 07:34:54

I think you are using a pretty good one. XML::LibXML, Matt Sergeant and Christian Glahn's Perl interface to Daniel Velliard's libxml2 is one of the faster XML Parsers that I know of.

Dotan Dimet

It really depends on your needs, as people have said. To parse XML files that were ~100Mb in size (gene annotations from TAIR, 1 file per chromosome), I used mirod's XML::Twig module, which lets you set callbacks to parse the elements that interest you, presenting each sub-document as an XML::Simple tree. It combines the benefits of a SAX parser (scanning the file as a stream) with a DOM parser (working more easily with the interesting pieces).

If you need speed, power or features, XML::LibXML is the way to go. If you're after ease of use, though, XML::Simple is a viable alternative.

In my experience XML::Simple is best for quick and dirty parsing of XML. We use it for parsing data from third parties that do not always conform to the XML standard. XML::Simple throws informative errors and gets you up an running extremely quickly.

Zvika

(Actually it's not an answer, but a comment - however, I cannot comment...)

XML::Simple has been mentioned here.
(I know it's few from few years ago, but this appeared up in Google today...)

However, it's site (http://metacpan.org/pod/XML::Simple) now says:

STATUS OF THIS MODULE

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.

The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.

Patches with bug fixes and documentation fixes are welcome, but new features are unlikely to be added.

singingfish

You could also look at XML::Liberal which uses LibXML underneath.

I think you should give XML::MyXML a try, too. It's very easy to use.

I'll offer one that SHOULD NOT be used: XML::Parser.

It automatically expands HTML entities to their UTF-8 equivalents, and the option to disable this behavior does not work on the most characteristic of all entities, &.

Additionally, its XMLDecl-parser will interpret and display the standalone attribute in the <?xml ... ?> block as "standalone"="1", which is absolutely incorrect -- it should be "standalone"="yes".

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!