How to use saxon built-in catalog feature

别来无恙 提交于 2020-01-11 10:35:29

问题


I downloaded SaxonHE9-4-0-6J and want to process XHTML on CLI. However Saxon tries to load DTD from W3C and it takes too much time for every simple command.

I have xml catalog, which I use successfully with xmllint by set env variable pointing to catalog file, but I have no idea how to make Saxon use it. Google reveals whole history of changes (thus confusion) in regards of using catalogs with Saxon, and none made me happy.

I downloaded resolver.jar and set it in my CLASSPATH, but I can't make Saxon use it. After various combinations, I followed http://www.saxonica.com/documentation/sourcedocs/xml-catalogs.xml by using just catalog variable, like:

-catalog:path-to-my-catalog

(tried both URI and regular paths), and without setting -r, -x, -y switches, but Saxon doesn't see it. I get this error:

Query processing failed: Failed to load Apache catalog resolver library

resolver.jar is set in my classpath and I can use it from command line:

C:\temp>java org.apache.xml.resolver.apps.resolver
Usage: resolver [options] keyword

Where:

-c catalogfile  Loads a particular catalog file.
-n name         Sets the name.
-p publicId     Sets the public identifier.
-s systemId     Sets the system identifier.
-a              Makes the system URI absolute before resolution
-u uri          Sets the URI.
-d integer      Set the debug level.
keyword         Identifies the type of resolution to perform:
                doctype, document, entity, notation, public, system,
                or uri.

OTOH, Saxon archive itself already includes XHTML and various other DTDs, so there must be simple way out from this frustration.

How to use Saxon on command-line and instruct it to use local DTDs?


回答1:


From the saxonica link in your question:

When the -catalog option is used on the command line, this overrides the internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on setting the XML parser's EntityResolver, it is not possible to use them in conjunction.

This sounds to me like Saxon automatically uses local copies of the well-known W3C DTDs, but if you specify -catalog, it does not use the internal resolver and you have to specify these explicitly in your catalog.


Here's a working example of using a catalog with Saxon...

File/directory structure of my example

C:/so_test/lib
C:/so_test/lib/catalog.xml
C:/so_test/lib/resolver.jar
C:/so_test/lib/saxon9he.jar
C:/so_test/lib/test.dtd
C:/so_test/test.xml

XML DTD (so_test/lib/test.dtd)

<!ELEMENT test (foo)>
<!ELEMENT foo (#PCDATA)>

XML Instance (so_test/test.xml)

Note that the system identifier points to a location that doesn't exist to make sure the catalog is being used.

<!DOCTYPE test PUBLIC "-//TEST//Dan Test//EN" "dir_that_doesnt_exist/test.dtd">
<test>
    <foo>Success!</foo>
</test>

XML Catalog (so_test/lib/catalog.xml)

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <group prefer="public" xml:base="file:///C:/so_test/lib">
        <public publicId="-//TEST//Dan Test//EN" uri="lib/test.dtd"/>
    </group>
</catalog>

Command Line

Note the -dtd option to enable validation.

C:\so_test>java -cp lib/saxon9he.jar;lib/resolver.jar net.sf.saxon.Query -s:"test.xml" -qs:"<results>{data(/test/foo)}</results>" -catalog:"lib/catalog.xml" -dtd

Results

<results>Success!</results>

If I make the XML instance invalid:

<!DOCTYPE test PUBLIC "-//TEST//Dan Test//EN" "dir_that_doesnt_exist/test.dtd">
<test>
    <x/>
    <foo>Success!</foo>
</test>

and run the same command line as above, here is the result:

Recoverable error on line 4 column 6 of test.xml:
  SXXP0003: Error reported by XML parser: Element type "x" must be declared.
Recoverable error on line 6 column 8 of test.xml:
  SXXP0003: Error reported by XML parser: The content of element type "test" must match "(foo)".
Query processing failed: The XML parser reported two validation errors

Hopefully this example will help you figure out what to change with your setup.

Also, using the -t option gives you additional information such as what catalog was loaded and if the public identifier was resolved:

Loading catalog: file:///C:/so_test/lib/catalog.xml
Saxon-HE 9.4.0.6J from Saxonica
Java version 1.6.0_35
Analyzing query from {<results>{data(/test/foo)}</results>}
Analysis time: 122.70132 milliseconds
Processing file:/C:/so_test/test.xml
Using parser org.apache.xml.resolver.tools.ResolvingXMLReader
Building tree for file:/C:/so_test/test.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Resolved public: -//TEST//Dan Test//EN
        file:/C:/so_test/lib/test.dtd
Tree built in 0 milliseconds
Tree size: 5 nodes, 8 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?><results>Success!</results>Execution time: 19.482079ms
Memory used: 20648808

Additional Information

Saxon distributes the Apache version of Xerces, so use the resolver.jar found in the Apache Xerces distribution.




回答2:


Daniel Haley has answered better than I could about how to use an explicit catalog with Saxon.

As for using built-in copies of the well-known DTDs, Saxon 9.4 will indeed do this automatically by default if it recognizes the system ID or public ID of the required resource. If it's going to the W3C site, the first thing we need to discover is the precise form of the DOCTYPE you are using.

The error message about failure to load the Apache catalog resolver actually means that Saxon has been unable to load the class org.apache.xml.resolver.CatalogManager. I wonder if you're using a version of the resolver that doesn't include this class? I can't think of any other explanation.



来源:https://stackoverflow.com/questions/14165765/how-to-use-saxon-built-in-catalog-feature

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!