Java - Handle indentation in “getTextContent()” of DOM parsed XML

拟墨画扇 提交于 2020-01-06 02:34:17

问题


I've written some java code that parses an XML using DOM for loading data in a program of mine. Formatting the XML with Eclipse "format" function, I've encountered a problem: the previous working getTextContent() from a document element, now returns a string that contains the whitespaces (or whatelse) added from Eclipse's formatting. I'm looking for a solution that given:

<myElement> some text

of mine

</myElement>

when I code-select the element <myElement> from the document, I want the getTextContent() to behave like:

myElement.getTextContent().equals("some text of mine");

while it actually fails.

If I'm being too non-specific, tell me, thanks.


回答1:


Use a helper function to pack XML text content.

public String getPackedContent(Element element) {
    if (element != null) {
        String text = element.getTextContent();
        if (text != null) {
            return text.trim().replaceAll("\\s+", " ");
        }
    }
    return "";
}

System.out.print(getPackedContent(myElement)); // "some text of mine"

String#replaceAll() takes a regex expression to search the string for matches that should be replaced with the substitution string passed as the second argument. \\s+ means one or more (+) whitespaces (\s) which includes new lines. The first \ escapes the actual \ required in \s.



来源:https://stackoverflow.com/questions/18289235/java-handle-indentation-in-gettextcontent-of-dom-parsed-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!