I\'m looking for a simple Java snippet to remove empty tags from a (any) XML structure
bla
<
If the xml is feed as a String; regex could be used to filter out empty elements:
<(\\w+)></\\1>|<\\w+/>
This will find empty elements.
data.replaceAll(re, "")
data in this case a variable holding your xml string.
Not saying this would be the best of solutions, but it is possible...
To remove all empty tags, even if they are one after another, one possibile solution is:
private void removeEmptyTags(Document document) {
List<Node> listNode = new ArrayList<Node>();
findListEmptyTags(document.getRootElement(), listNode);
if (listNode.size() == 0)
return;
for (Node node : listNode) {
node.getParent().removeChild(node);
}
removeEmptyTags(document);
}
private void findListEmptyTags(Node node, List<Node> listNode) {
if (node != null && node.getChildCount() == 0 && "".equals(node.getValue()) && ((Element) node).getAttributeCount() == 0) {
listNode.add(node);
return;
}
// recurse the children
for (int i = 0; i < node.getChildCount(); i++) {
findListEmptyTags(node.getChild(i), listNode);
}
}
public static void main(String[] args) {
final String regex1 = "<([a-zA-Z0-9-\\_]*)[^>]*/>";
final String regex2 = "<([a-zA-Z0-9-\\_]*)[^>]*>\\s*</\\1>";
String xmlString = "<xml><field1>bla</field1><field2></field2><field3/><structure1><field4><field50><field50/></field50></field4><field5></field5></structure1></xml>";
System.out.println(xmlString);
final Pattern pattern1 = Pattern.compile(regex1);
final Pattern pattern2 = Pattern.compile(regex2);
Matcher matcher1;
Matcher matcher2;
do {
xmlString = xmlString.replaceAll(regex1, "").replaceAll(regex2, "");
matcher1 = pattern1.matcher(xmlString);
matcher2 = pattern2.matcher(xmlString);
} while (matcher1.find() || matcher2.find());
System.out.println(xmlString);
}
Console:
<xml>
<field1>bla</field1>
<field2></field2>
<field3/>
<structure1>
<field4>
<field50>
<field60/>
</field50>
</field4>
<field5></field5>
</structure1>
</xml>
<xml>
<field1>bla</field1>
</xml>
Online demo here