Sort multigigabyte xml file

江枫思渺然 提交于 2020-01-24 00:43:09

问题


How to sort all tags in multigigabyte xml file alphabetically, all equal tags should also be sorted by attributes? All methods suggested in related questions fail for such large data.

I'm looking for existing tools for Windows or Linux.


回答1:


If you are using an XSLT to do the sorting, you can use the streaming-safe subset of XSLT with a streaming-enabled processor like Saxon. Saxon in streaming mode can easily manage gigabytes of input XML data.

The Saxon website has very detailed documentation about streaming XSLT templates.




回答2:


As the original goal was to be able to compare to extremely large xmls which contained similar data but in different order I ended up doing splitting xmls in logical chunks (each xml contained thousands of processed documents, and it was split so each document went into separate file with csplit utility), and then compared each pair of equally size documents from two xmls (luckily there were no equally sized documents within one xml).

Not perfect solution but it worked withing reasonable time and space constraints



来源:https://stackoverflow.com/questions/9095653/sort-multigigabyte-xml-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!