TagSoup and XPath

匆匆过客 提交于 2019-12-04 11:51:18

Java XML API is such a pain

Indeed it is. Consider moving to XSLT 2.0 / XPath 2.0 and using Saxon's s9api interface instead. It would look roughly like this:

Processor proc = new Processor();

InputStream input = new FileInputStream("/tmp/g.html");
XMLReader reader = new Parser();
reader.setFeature(Parser.namespacesFeature, false);
Source source = new SAXSource(parser, input);

DocumentBuilder builder = proc.newDocumentBuilder();
XdmNode input = builder.build(source);

XPathCompiler compiler = proc.newXPathCompiler();
XdmValue result = compiler.evaluate("//span", input);
System.out.println(result.size());
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!