Hello World Saxon with Java

感情迁移 提交于 2019-12-11 18:07:24

问题


Using the JAR files installed through apt for Saxon-HE and tagsoup parsing html is a one-liner as:

thufir@dur:~/saxon$ 
thufir@dur:~/saxon$ java -cp /usr/share/java/Saxon-HE-9.8.0.14.jar:/usr/share/java/tagsoup-1.2.1.jar net.sf.saxon.Query -x:org.ccil.cowan.tagsoup.Parser -qs:doc\(\'http://books.toscrape.com/\'\) 
<?xml version="1.0" encoding="UTF-8"?><!--[if lt IE 7]>      <html lang="en-us" class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--><!--[if IE 7]>         <html lang="en-us" class="no-js lt-ie9 lt-ie8"> <![endif]--><!--[if IE 8]>         <html lang="en-us" class="no-js lt-ie9"> <![endif]--><!--[if gt IE 8]><!--><html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml" class="no-js" lang="en-us"><!--<![endif]--><head><title>
    All products | Books to Scrape - Sandbox
..        
        <!-- Version: N/A -->

thufir@dur:~/saxon$ 
thufir@dur:~/saxon$ 

How would I do that from Java? In particular, what imports are required from Saxon for this execution? Perhaps using Saxon and the JAXP interface?

also:

http://codingwithpassion.blogspot.com/2011/03/saxon-xslt-java-example.html


回答1:


You will find many simple examples of invoking transformations using Saxon from Java in the saxon-resources download available on both the saxonica.com and sourceforge.net web sites.

It's difficult to know exactly what you want here, because your command line example isn't using Saxon to do anything useful other than invoking the TagSoup parser and serializing the result. The simplest way to do that from Java is with a JAXP identity transformation, which runs just as well with the built-in XSLT transformer in the JDK as with Saxon:

TransformerFactory factory = TransformerFactory.newInstance();
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(xmlReader, new InputSource("http://books.toscrape.com/"));
Result output = new StreamResult(System.out);
factory.newTransformer().transform(input, output);

If you want to add some XSLT or XQuery processing then of course that's perfectly possible (I would always use the s9api API for Saxon, but you can also use JAXP or XQJ), but the details depend on exactly what you want to do.



来源:https://stackoverflow.com/questions/54031200/hello-world-saxon-with-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!