How can I remove the whitespaces and line breaks in an XML string in Python 2.6? I tried the following packages:
etree: This snippet keeps the original whitespaces:<
If whitespace in "non-leaf" nodes is what we're trying to remove then the following function will do it (recursively if specified):
from xml.dom import Node
def stripNode(node, recurse=False):
nodesToRemove = []
nodeToBeStripped = False
for childNode in node.childNodes:
# list empty text nodes (to remove if any should be)
if (childNode.nodeType == Node.TEXT_NODE and childNode.nodeValue.strip() == ""):
nodesToRemove.append(childNode)
# only remove empty text nodes if not a leaf node (i.e. a child element exists)
if childNode.nodeType == Node.ELEMENT_NODE:
nodeToBeStripped = True
# remove flagged text nodes
if nodeToBeStripped:
for childNode in nodesToRemove:
node.removeChild(childNode)
# recurse if specified
if recurse:
for childNode in node.childNodes:
stripNode(childNode, True)
However, Thanatos is correct. Whitespace can represent data in XML so use with caution.