Efficient Parser for large XMLs

安稳与你 提交于 2019-12-01 08:15:20

问题


I have very large XML files to process. I want to convert them to readable PDFs with colors, borders, images, tables and fonts. I don't have a lot of resources in my machine, thus, I need my application to be very optimal addressing memory and processor.

I did a humble research to make my mind about the technology to use but I could not decide what is the best programming language and API for my requirements. I believe DOM is not an option because it consumes a lot of memory, but, would Java with SAX parser fulfill my requirements?

Some people also recommended Python for XML parsing. Is it that good?

I would appreciate your kind advice.


回答1:


SAX is very good parser but it is outdated.

Recently Oracle have launched new Parser to parse the xml files efficiently called Stax

*http://docs.oracle.com/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html*

Attached link will also shows comparisons of all parsers along with memory utilization and its features.

Thanks, Pavan




回答2:


Yes I think Sax will work for you. Dom is not good for large XML files as It keeps the whole XML file in memory. You can see a Comparison I wrote in my blog here




回答3:


Not sure if you're interested in using Perl, but if you're open to it, the following are all good options: LibXML, LibXSLT and XML-Twig, which is good for files too large to fit in memory (so is LibXML::Reader). Of course as SAX is there, but it can be slow. Most people recommend the first two options. Finally, CPAN is an amazing source with a very active community.




回答4:


If you want the best of DOM without its memory overhead, vtd-xml is the best bet, here is the proof...

http://recipp.ipp.pt/bitstream/10400.22/1847/1/ART_BrunoOliveira_2013.pdf



来源:https://stackoverflow.com/questions/17017966/efficient-parser-for-large-xmls

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!