Massive differences in speed with virtually identical XQuery (exist-db) interogations

感情迁移 提交于 2019-12-13 00:38:13

问题


I've been porting an SQL database over to exist over the last few weeks, and while I so far managed to get over all previous hindrances, I've now run into something for which neither the official documentation, searching online, nor common sense was able to help.

In short, I have a pretty big collection (about 90k entries, spread over 20 files), with most individual entries looking something like this (this is a massive simplification so I can get the point across):

<document>
 <document_id>Numerical Value</document_id>
 <page_id>Some other numerical value</page_id>
</document>

I then pass a value through php, lets call it $val The strange part is that when I run the standard query

$p in collection("/db/folder_location")//documentset/document[xs:integer(document_id) eq $val]

No matter what value I pass, it returns all the results in a matter of seconds. If I slightly modify it, however, making it:

$p in collection("/db/folder_location")//documentset/document[xs:integer(page_id) eq $val]

It either takes over 30 seconds to return the values or simply stays locked in a running query and never returns anything. Of all the 30 queries I already converted, this is the only time I ran into this problem and could not find a workaround.


回答1:


To address the query performance problem, I would suggest some changes to your query and/or the addition of a range index on the document_id and page_id elements.

Your query casts all document_id and page_id elements as xs:integer. This is an inefficient operation given a large dataset. Consider (a) removing this type casting, (b) reversing it (cast $val as xs:string), or (c) adding a range index on these two elements, with type="xs:integer". This latter option will let you remove the casting in your predicate (allowing you to change it to document[document_id eq $val] and document[page_id eq $val]), and the index should greatly speed the lookup.

To add a range index for your query, create a collection configuration document like this:

<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <range>
            <create qname="document_id" type="xs:integer"/>
            <create qname="page_id" type="xs:integer"/>
        </range>
    </index>
</collection>

If your data is in /db/folder_location, then store this document as collection.xconf in /db/system/config/db/folder_location. Then reindex your collection with xmldb:reindex("/db/system/config/db/folder_location"). As the documentation on range indexes states, with this index definition in place:

indexes will be used automatically for general or value comparisons as well as string functions like fn:contains, fn:starts-with, fn:ends-with.

For more on range indexes in eXist, see https://exist-db.org/exist/apps/doc/newrangeindex.xml. For query optimization techniques, see https://exist-db.org/exist/apps/doc/tuning.xml. For indexes in general in eXist, see https://exist-db.org/exist/apps/doc/indexing.xml.



来源:https://stackoverflow.com/questions/49174006/massive-differences-in-speed-with-virtually-identical-xquery-exist-db-interoga

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!