eXist-db / XSLT / Saxon collection() slow as molasses (or errors out with memory limit)

泪湿孤枕 提交于 2020-01-06 05:36:08

问题


Coming from this question, I managed one entirely unsatisfactory solution for accessing an eXist-DB collection() from an XSLT 2.0 document loaded from within an eXist-db/Xquery transformation function:

The XSLT file declares a variable :

 <xsl:variable name="coll" select="collection('xmldb:exist:///db/apps/deheresi/data/collection_ms609.xml')"/>

This points to a catalog xml file I created (per Saxon documentation) that looks like this, in order to load the actual collection:

<collection stable="true">
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0001.xml"/>
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0002.xml"/>
  ...
  ...
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0709.xml"/>
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0710.xml"/>
</collection>

This allows the XSLT file to use a key that needs to search across all these files:

<xsl:key name="correspkey" match="tei:seg[@type='dep_event' and @corresp]" use="@corresp"/>

<xsl:variable name="correspvar" select="self::seg[@type='dep_event' and @corresp]/@corresp"/>

<xsl:value-of select="$coll/(key('correspid',$correspvar) except $correspvar)/@id" separator=", "/>

As it stands, if I have 50 documents in the catalog, I get a result in 2 minutes; with all 710 I get a java GC error after 4 minutes.

I have set indexes on relevant nodes in eXist-DB, but this does nothing to performance. It seems to me Saxon is working 'outside' eXist-DB's optimisations, treating eXist-DB as a simple file system.

(For what it's worth, setting href="/db/apps/deheresi/data/ms609_0001.xml" does not let Saxon see the documents.)

I suspect all of this is why the eXist-DB documentation is non-existent.

As it goes, I am looking for solutions for intensive searches of collections from within XSLT 2.0 loaded within eXist-DB by Xquery transform().

If anything, I hope this post helps future searchers encountering the same problem.


回答1:


The general architectural principle is: try to move the searching closer to the data. In this case this means: use eXist to find the documents of interest, don't extract every possible candidate document from eXist and then ask Saxon to do the searching. Select the actual documents of interest in an eXist XQuery, and then pass the list of these documents to Saxon in a stylesheet parameter.



来源:https://stackoverflow.com/questions/52945207/exist-db-xslt-saxon-collection-slow-as-molasses-or-errors-out-with-memory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!