Is there an open source command-line tool (for Linux) to diff XML files which ignores the element order?
Example input file a.xml
:
You'd have to write your own interpreter to preprocess. XSLT is one way to do it ... maybe; I'm not an expert in XSLT and I'm not sure you can sort things with it.
Here is a quick and dirty perl script which can do what you want. Note that it's far far far wiser to use a real XML parser. I'm not familiar with any, so I'm exposing you to my terrible practice of writing them myself. Note the comments; you have been warned.
#!/usr/bin/perl
use strict;
use warnings;
# NOTE: general wisdom - do not use simple homebrewed XML parsers like this one!
#
# This makes sweeping assumptions that are not production grade. Including:
# 1. Assumption of one XML tag per line
# 2. Assumption that no XML tag contains a greater-than character
# like
# 3. Assumes the XML is well-formed, nothing like baz
# recursive function to parse each tag.
sub parse_tag {
my $tag_name = shift;
my @level = (); # LOCAL: each recursive call has its OWN distinct @level
while(<>) {
chomp;
# new open tag: match new tag name, parse in recursive call
if (m"<\s*([^\s/>]+)[^/>]*>") {
push (@level, "$_\n" . parse_tag($1) );
# close tag, verified by name, or else last line of input
} elsif (m"<\s*/\s*$tag_name[\s>]"i or eof()) {
# return all children, sorted and concatenated, then the end tag
return join("\n", sort @level) . "\n$_";
} else {
push (@level, $_);
}
}
return join("\n", sort @level);
}
# start with an impossible tag in case there is no root
print parse_tag("");
Save that as xml_diff_prep.pl
and then run this:
$ diff -sq <(perl xml_diff_prep.pl a.xml) <(perl xml_diff_prep.pl b.xml)
Files /proc/self/fd/11 and /proc/self/fd/12 are identical
(I used the -s
and -q
flags to be explicit. You can use gvimdiff or whatever other utility or flags you like. Note it identifies the files by file descriptor; that's because I used a bash trick to run the preprocessor command on each input. They'll be in the same order you specified. Note that the contents may be in unexpected locations due to the sorting requested by this question.)
To satisfy your "Open Source" "command line tool" request, I hereby release this code as Open Source under the Beerware License (BSD 2-clause, if you think it's worthwhile, you are welcome to buy me a beer).