Java XPath (Apache JAXP implementation) performance

两盒软妹~` 提交于 2019-11-26 23:49:56

I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in

org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()

It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's

META-INF/service/org.apache.xml.dtm.DTMManager

configuration file. Every time!:

Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName() if the factory class name is the default anyway:

// Code from com.sun.org.apache.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(String factoryId,
                                     String propertiesFilename,
                                     String fallbackClassName) {
  SecuritySupport ss = SecuritySupport.getInstance();

  try {
    String systemProp = ss.getSystemProperty(factoryId);
    if (systemProp != null) { 

      // Return early from the method
      return systemProp;
    }
  } catch (SecurityException se) {
  }

  // [...] "Heavy" operations later

So here's a performance improvement overview for 10k consecutive XPath evaluations of //SomeNodeName against a 90k XML file (measured with System.nanoTime():

measured library        : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation    :     10400ms |      4717ms |              |     25500ms
reusing XPathFactory    :      5995ms |      2829ms |              |
reusing XPath           :      5900ms |      2890ms |              |
reusing XPathExpression :      5800ms |      2915ms |      16000ms |     25000ms
adding the JVM param    :      1163ms |       761ms |        n/a   |

note that the benchmark was a very primitive one. it may well be that your own benchmark will show that saxon outperforms xalan

I have filed this as a bug to the Xalan guys at Apache:

https://issues.apache.org/jira/browse/XALANJ-2540

Not a solution, but a pointer to the main problem: The slowest part of the process for evaluating an xpath in relation to an arbitrary node is the time it takes the DTM manager to find the node handle:

http://javasourcecode.org/html/open-source/jdk/jdk-6u23/com/sun/org/apache/xml/internal/dtm/ref/dom2dtm/DOM2DTM.html#getHandleOfNode%28org.w3c.dom.Node%29

If the node in question is at the end of the Document, it can end up walking the entire tree to find the node in question, for each and every query.

This explains why the hack to orphan out the target node works. There should be a way to cache these lookups, but at this point I can't see how.

To answer your question, vtd-xml is way faster than Jaxen or Xalan) (I would say on average 10x, and 60x has been reported...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!