How to remove extra empty lines from XML file?

谁说胖子不能爱 提交于 2019-11-28 19:36:30

First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object.

When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type TEXT);

To get rid of it, there are three approaches I can think of:

  • Associate the XML with a schema, and then use setValidating(true) along with setIgnoringElementContentWhitespace(true) on the DocumentBuilderFactory.

    (Note: setIgnoringElementContentWhitespace will only work if the parser is in validating mode, which is why you must use setValidating(true))

  • Write an XSL to process all nodes, filtering out whitespace-only TEXT nodes.
  • Use Java code to do this: use XPath to find all whitespace-only TEXT nodes, iterate through them and remove each one from its parent (using getParentNode().removeChild()). Something like this would do (doc would be your DOM document object):

    XPath xp = XPathFactory.newInstance().newXPath();
    NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET);
    
    for (int i=0; i < nl.getLength(); ++i) {
        Node node = nl.item(i);
        node.getParentNode().removeChild(node);
    }
    
Brad

I was able to fix this by using this code after removing all the old "path" nodes :

while( pathsElement.hasChildNodes() )
    pathsElement.removeChild( pathsElement.getFirstChild() );

This will remove all the generated empty spaces in the XML file.

Special thanks to MadProgrammer for commenting with the helpful link mentioned above.

You could look at something like this if you only need to "clean" your xml quickly. Then you could have a method like:

public static String cleanUp(String xml) {
    final StringReader reader = new StringReader(xml.trim());
    final StringWriter writer = new StringWriter();
    try {
        XmlUtil.prettyFormat(reader, writer);
        return writer.toString();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return xml.trim();
}

Also, to compare anche check differences, if you need it: XMLUnit

I faced the same problem, and I had no idea for the long time, but now, after this Brad's question and his own answer on his own question, I figured out where is the trouble.

I have to add my own answer, because Brad's one isn't really perfect, how Isaac said:

I wouldn't be a huge fan of blindly removing child nodes without knowing what they are

So, better "solution" (quoted because it is more likely workaround) is:

pathsElement.setTextContent("");

This completely removes useless blank lines. It is definitely better than removing all the child nodes. Brad, this should work for you too.

But, this is an effect, not the cause, and we got how to remove this effect, not the cause.

Cause is: when we call removeChild(), it removes this child, but it leaves indent of removed child, and line break too. And this indent_and_like_break is treated as a text content.

So, to remove the cause, we should figure out how to remove child and its indent. Welcome to my question about this.

I am using below code:

System.out.println("Start remove textnode");
        i=0;
        while (parentNode.getChildNodes().item(i)!=null) {
            System.out.println(parentNode.getChildNodes().item(i).getNodeName());
            if (parentNode.getChildNodes().item(i).getNodeName().equalsIgnoreCase("#text")) {
                parentNode.removeChild(parentNode.getChildNodes().item(i));
                System.out.println("text node removed");
            }
            i=i+1;

        }

Couple of remarks: 1) When your are manipulating XML (removing elements / adding new one) I strongly advice you to use XSLT (and not DOM) 2) When you tranform a XML Document by XSLT (as you do in your save method), set the OutputKeys.INDENT to "no" 3) For simple post processing of your xml (removing white space, comments, etc.) you can use a simple SAX2 filter

Tai Le
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);

There is a very simple way to get rid of the empty lines if using an DOM handling API (for example DOM4J):

  • place the text you want to keep in a variable(ie text)
  • set the node text to "" using node.setText("")
  • set the node text to text using node.setText(text)

et voila! there are no more empty lines. The other answers delineate very well how the extra empty lines in the xml output are actually extra nodes of type text.

This technique can be used with any DOM parsing system, so long as the name of the text setting function is changed to suit the one in your API, hence the way of representing it slightly more abstractly.

Hope this helps:)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!