How to get html from a org.w3c.dom.Node in java?

风格不统一 提交于 2019-12-07 20:11:28

问题


I've build a method which extracts data from an html document using the xpath components of saxon-he. I'm using w3c dom object model for this.

I already created a method which returns the text-value, similar like the text value method from jsoup (jsoupElement.text()):

    protected String getNodeValue(Node node) {
    NodeList childNodes = node.getChildNodes();
    for (int x = 0; x < childNodes.getLength(); x++) {
        Node data = childNodes.item(x);
        if (data.getNodeType() == Node.TEXT_NODE)
            return data.getNodeValue();
    }
    return "";
 }

This works fine but i now i need the underlying html of a selected node (with jsoup it would be jsoupElement.html()). Using the w3c dom object model i have org.w3c.dom.Node. How can i get the html from a org.w3c.dom.Node as String? I couldn't find anything regarding this in the documentation.

Just for clarification: I need the inner html (with or without the node element/tag) as String. Similar like http://api.jquery.com/html/ or http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#html--


回答1:


To serialize a W3C DOM Node's child nodes to HTML with Saxon you can use a default Transformer where you set the output method to html:

public static String getInnerHTML(Node node) throws TransformerConfigurationException, TransformerException
{
    StringWriter sw = new StringWriter();
    Result result = new StreamResult(sw);
    TransformerFactory factory = new net.sf.saxon.TransformerFactoryImpl();
    Transformer proc = factory.newTransformer();
    proc.setOutputProperty(OutputKeys.METHOD, "html");
    for (int i = 0; i < node.getChildNodes().getLength(); i++)
    {
        proc.transform(new DOMSource(node.getChildNodes().item(i)), result);
    }
    return sw.toString();
}

But as said, this is a serialization of the tree, the original XML or HTML is not stored in a DOM tree or Saxon's tree model, there is no way to access it.



来源:https://stackoverflow.com/questions/33635398/how-to-get-html-from-a-org-w3c-dom-node-in-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!