XML streaming with XProc

孤者浪人 提交于 2019-12-06 11:05:51

问题


I'm playing with xproc, the XML pipeline language and http://xmlcalabash.com/. I'd like to find an example for streaming large xml documents. for example, given the following huge xml document:

<Books>
 <Book>
   <title>Book-1</title>
 </Book>
 <Book>
   <title>Book-2</title>
 </Book>
 <Book>
   <title>Book-3</title>
 </Book>

<!-- many many.... -->
 <Book>
   <title>Book-N</title>
 </Book>
</Books>

How should I proceed to loop (streaming) over x->N documents like

<Books>
 <Book>
   <title>Book-x</title>
 </Book>
</Books>

and treat each document with a xslt ? is it possible with xproc ?


回答1:


You should have a look to QuiXProc ( http://code.google.com/p/quixproc ) that is an implementation of XProc based on Calabash that added Streaming and Parallel processing Hope this helps.




回答2:


Here is how you could do it with XProc that would stream with QuiXProc

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0">
  <p:load href="hugedocument.xml"/>
  <p:for-each>
    <p:iteration-source select="/Books/Book"/>
    <p:xslt>
      <p:input port="stylesheet">
        <p:document href="book.xsl"/>
      </p:input>
      <p:input port="parameters">
        <p:empty/>
      </p:input>
    </p:xslt>
  </p:for-each>
  <p:wrap-sequence wrapper="Books"/>    
  <p:store href="hugedocument.res.xml"/>
</p:declare-step>



回答3:


I remember a recent discussion on the XProc Dev list related to streaming. It seems that Calabash does not attempt streaming, see Norman Walsh message here.

Saxon SA, supports streaming for XSLT and XQuery, for details see: http://www.saxonica.com/documentation/sourcedocs/serial.html




回答4:


Yes, much as I'd like to support streaming, my real goals for XML Calabash were completeness and correctness.

I have some ideas for reworking the internals of XML Calabash to use more of the push/pull streaming features of Saxon, but there are a lot of other things on my "todo" list too :-/




回答5:


EMC's Calumet (http://developer.emc.com/xmltech) doesn't do streaming either. The main focus until now has been compliance with the XProc specification together with integrability with other our XML-related tools, such as the xDB native XML database. Support for streaming is on my radar, although I can't tell when I will be able to get to that right now.




回答6:


Even though most XProc processors don't stream data between steps, this doesn't necessarily have to mean that your case won't work (e.g. will explode in terms of memory usage for instance). It depends on what you want to do with the result of the XSLT step.

If you are gathering the results, trying to build one big output file, then yes, this may be a problem. But in that case you might be better off with a streaming solution (SAX, STaX, JOOST parser, ..) anyhow.

If you will be storing the results of each XSLT separately, then the problem will be much less. You would only need to be concerned whether you have sufficient memory available to load the initial document, and do processing on each document. Not sure how well Saxon underneath XMLCalabash would behave, but I expect that a size of upto 50 megabyte won't have to be a very big issue..

Cheers



来源:https://stackoverflow.com/questions/878591/xml-streaming-with-xproc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!