How to convert a Jsoup Document to a W3C Document?

橙三吉。 提交于 2019-11-28 00:29:43

问题


I have build a Jsoup Document by parsing a in-house HTML page,

public Document newDocument(String path) throws IOException {

    Document doc = null;
    doc = Jsoup.connect(path).timeout(0).get();
            return new HtmlDocument<Document>(doc);
}

I would want to convert the Jsoup document to my org.w3c.dom.Document I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null. I am unable to understand the problem, tried searching but couldnt find any answer.

Code to generate the W3C DOM Document :

Document jsoupDoc=factory.newDocument("http:localhost/testcases/test_2.html"));
org.w3c.dom.Document docu= DOMBuilder.jsoup2DOM(jsoupDoc);

Can anyone please help me on this?


回答1:


To retrieve a jsoup document via HTTP, make a call to Jsoup.connect(...).get(). To load a jsoup document locally, make a call to Jsoup.parse(new File("..."), "UTF-8").

The call to DomBuilder is correct.

When you say,

I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null.

I think you mean, "I used an available library, DOMBuilder, for this but when printing the result, I get [#document: null]." At least, that was the result I saw when I tried printing the w3cDoc object - but that doesn't mean the object is null. I was able to traverse the document by making calls to getDocumentElement and getChildNodes.

public static void main(String[] args) {
    Document jsoupDoc = null;

    try {
        jsoupDoc = Jsoup.connect("http://stackoverflow.com/questions/17802445").get();
    } catch (IOException e) {
        e.printStackTrace();
    }

    org.w3c.dom.Document w3cDoc= DOMBuilder.jsoup2DOM(jsoupDoc);
    Element e = w3cDoc.getDocumentElement();
    NodeList childNodes = e.getChildNodes();
    Node n = childNodes.item(2);
    System.out.println(n.getNodeName());
}



回答2:


Alternatively, Jsoup provides the W3CDom class with the method fromJsoup. This method transforms a Jsoup Document into a W3C document.

Document jsoupDoc = ...
W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(jsoupDoc);

UPDATE:

  • Since 1.10.3 W3CDom is no longer experimental.
  • Up to Jsoup 1.10.2 W3CDom class is still experimental.


来源:https://stackoverflow.com/questions/17802445/how-to-convert-a-jsoup-document-to-a-w3c-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!