Java Remove empty XML tags

前端 未结 9 2347
死守一世寂寞
死守一世寂寞 2020-12-11 16:57

I\'m looking for a simple Java snippet to remove empty tags from a (any) XML structure


    bla
    <         


        
相关标签:
9条回答
  • 2020-12-11 17:35

    I needed to add strip-space and indent elements to Chris R's answer, otherwise enclosing blocks, newly empty, are not removed:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:strip-space elements="*"/>
      <xsl:output indent="yes" />
      <xsl:template match="@*|node()">
        <xsl:if test=". != '' or ./@* != ''">
          <xsl:copy>
            <xsl:apply-templates  select="@*|node()"/>
          </xsl:copy>
        </xsl:if>
      </xsl:template>
    </xsl:stylesheet>
    
    0 讨论(0)
  • 2020-12-11 17:38

    As a side note: The different states of a tag actually have meaning:

    • Open-Closed Tag: The element exists and its value is an empty string
    • Single-Tag: The element exists, but the value is null or nil
    • Missing Tag: The element does not exist

    So, by removing empty Open-Closed tags and Single-Tags, you're merging them with the group of missing tags and thus lose information.

    0 讨论(0)
  • 2020-12-11 17:40

    I tested Jonik's and Marco's sample codes. But those are not exactly what I want. So I modified their source and below code works well for me. I've already adjust this code in my project. please test it, if you want.

    public String removeEmptyNode(String xml){
        String cleanedXml = null;
        try{
            xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + xml;
            InputStream input = new ByteArrayInputStream(xml.getBytes("UTF-8"));
            Document document = new Builder().build(input);
            removeEmptyNode(document.getRootElement());
            cleanedXml = document.toXML();
        }catch(Exception e){
            e.printStackTrace();
        }
        return cleanedXml;
    }
    
    private static void removeEmptyNode(Node node) {
        if(node.getChildCount()!=0){
            int count = node.getChildCount();
            for (int i = count-1; i >= 0 ; i--) { 
                removeEmptyNode(node.getChild(i));
            }
        }
    
        doCheck(node);
    }
    
    private static void doCheck(Node node){
        if(node.getChildCount() == 0 && "".equals(node.getValue().trim())) {
            try{node.getParent().removeChild(node);}catch(Exception e){}
        }       
    }
    
    0 讨论(0)
  • 2020-12-11 17:46

    I was wondering whether it would be easy to do this with the XOM library and gave it a try.

    It turned out to be quite easy:

    import nu.xom.*;
    
    import java.io.File;
    import java.io.IOException;
    
    public class RemoveEmptyTags {
    
        public static void main(String[] args) throws IOException, ParsingException {
            Document document = new Builder().build(new File("original.xml"));
            handleNode(document.getRootElement());
            System.out.println(document.toXML()); // empty elements now removed
        }
    
        private static void handleNode(Node node) {
            if (node.getChildCount() == 0 && "".equals(node.getValue())) {
                node.getParent().removeChild(node);
                return;
            }
            // recurse the children
            for (int i = 0; i < node.getChildCount(); i++) { 
                handleNode(node.getChild(i));
            }
        }
    }
    

    This probably won't handle all corner cases properly, like a completely empty document. And what to do about elements that are otherwise empty but have attributes?

    If you want to save XML tags with attributes, we can add in the method 'handleNode' the following check:

    ... && ((Element) node).getAttributeCount() == 0) )
    

    Also, if the xml has two or more empty tags, one after another; this recursive method doesn't remove all empty tags!

    (This answer is part of my evaluation of XOM as a potential replacement to dom4j.)

    0 讨论(0)
  • 2020-12-11 17:47

    This XSLT stylesheet should do what you're looking for:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:template match="@*|node()">
        <xsl:if test=". != '' or ./@* != ''">
          <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
          </xsl:copy>
        </xsl:if>
      </xsl:template>
    </xsl:stylesheet>
    

    It should also preserve elements which are empty but have attributes which aren't. If you don't want this behaviour then change:

    <xsl:if test=". != '' or ./@* != ''">

    To: <xsl:if test=". != ''">

    If you want to know how to apply XSLT in Java, there should be plenty of tutorials out there on the Interwebs. Good luck!

    0 讨论(0)
  • 2020-12-11 17:52

    With XSLT you could transform your XML to ignore the empty tags and re-write the document.

    0 讨论(0)
提交回复
热议问题