Building on another SO question, how can one check whether two well-formed XML snippets are semantically equal. All I need is \"equal\" or not, since I\'m using this for un
I had the same problem: two documents I wanted to compare that had the same attributes but in different orders.
It seems that XML Canonicalization (C14N) in lxml works well for this, but I'm definitely not an XML expert. I'm curious to know if somebody else can point out drawbacks to this approach.
parser = etree.XMLParser(remove_blank_text=True)
xml1 = etree.fromstring(xml_string1, parser)
xml2 = etree.fromstring(xml_string2, parser)
print "xml1 == xml2: " + str(xml1 == xml2)
ppxml1 = etree.tostring(xml1, pretty_print=True)
ppxml2 = etree.tostring(xml2, pretty_print=True)
print "pretty(xml1) == pretty(xml2): " + str(ppxml1 == ppxml2)
xml_string_io1 = StringIO()
xml1.getroottree().write_c14n(xml_string_io1)
cxml1 = xml_string_io1.getvalue()
xml_string_io2 = StringIO()
xml2.getroottree().write_c14n(xml_string_io2)
cxml2 = xml_string_io2.getvalue()
print "canonicalize(xml1) == canonicalize(xml2): " + str(cxml1 == cxml2)
Running this gives me:
$ python test.py
xml1 == xml2: false
pretty(xml1) == pretty(xml2): false
canonicalize(xml1) == canonicalize(xml2): true