TagSoup and XPath

核能气质少年 提交于 2019-12-06 09:07:54

问题


I'm trying to use TagSoup with XPath (JAXP). I know how to obtain SAX parser from TagSoup (or XMLReader). But I failed to find how to create DocumentBuilder that will use that SAX parser. How do I do that?

Thank you.

EDIT: Sorry for being so general but Java XML API is such a pain.

EDIT2:

Problem solved:

public static void main(String[] args) throws XPathExpressionException, IOException,
        SAXNotRecognizedException, SAXNotSupportedException,
        TransformerFactoryConfigurationError, TransformerException {

    XPathFactory xpathFac = XPathFactory.newInstance();
    XPath xpath = xpathFac.newXPath();

    InputStream input = new FileInputStream("/tmp/g.html");

    XMLReader reader = new Parser();
    reader.setFeature(Parser.namespacesFeature, false);
    Transformer transformer = TransformerFactory.newInstance().newTransformer();

    DOMResult result = new DOMResult();
    transformer.transform(new SAXSource(reader, new InputSource(input)), result);

    Node htmlNode = result.getNode();
    NodeList nodes = (NodeList) xpath.evaluate("//span", htmlNode, XPathConstants.NODESET);
    System.out.println(nodes.getLength());
}

EDIT3:

Link that helped me: http://www.jezuk.co.uk/cgi-bin/view/jez?id=2643


回答1:


Java XML API is such a pain

Indeed it is. Consider moving to XSLT 2.0 / XPath 2.0 and using Saxon's s9api interface instead. It would look roughly like this:

Processor proc = new Processor();

InputStream input = new FileInputStream("/tmp/g.html");
XMLReader reader = new Parser();
reader.setFeature(Parser.namespacesFeature, false);
Source source = new SAXSource(parser, input);

DocumentBuilder builder = proc.newDocumentBuilder();
XdmNode input = builder.build(source);

XPathCompiler compiler = proc.newXPathCompiler();
XdmValue result = compiler.evaluate("//span", input);
System.out.println(result.size());


来源:https://stackoverflow.com/questions/6783225/tagsoup-and-xpath

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!