Open source command line tool for Linux to diff XML files ignoring element order

后端 未结 6 1600
走了就别回头了
走了就别回头了 2021-02-04 09:57

Is there an open source command-line tool (for Linux) to diff XML files which ignores the element order?

Example input file a.xml:



        
6条回答
  •  萌比男神i
    2021-02-04 10:50

    You'd have to write your own interpreter to preprocess. XSLT is one way to do it ... maybe; I'm not an expert in XSLT and I'm not sure you can sort things with it.

    Here is a quick and dirty perl script which can do what you want. Note that it's far far far wiser to use a real XML parser. I'm not familiar with any, so I'm exposing you to my terrible practice of writing them myself. Note the comments; you have been warned.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    # NOTE: general wisdom - do not use simple homebrewed XML parsers like this one!
    #
    # This makes sweeping assumptions that are not production grade.  Including:
    #   1. Assumption of one XML tag per line
    #   2. Assumption that no XML tag contains a greater-than character
    #      like 
    #   3. Assumes the XML is well-formed, nothing like baz
    
    # recursive function to parse each tag.
    sub parse_tag {
      my $tag_name = shift;
      my @level = (); # LOCAL: each recursive call has its OWN distinct @level
      while(<>) {
        chomp;
    
        # new open tag:  match new tag name, parse in recursive call
        if (m"<\s*([^\s/>]+)[^/>]*>") {
          push (@level, "$_\n" . parse_tag($1) );
    
        # close tag, verified by name, or else last line of input
        } elsif (m"<\s*/\s*$tag_name[\s>]"i or eof()) {
          # return all children, sorted and concatenated, then the end tag
          return join("\n", sort @level) . "\n$_";
    
        } else {
          push (@level, $_);
        }
      }
      return join("\n", sort @level);
    }
    
    # start with an impossible tag in case there is no root
    print parse_tag("");
    

    Save that as xml_diff_prep.pl and then run this:

    $ diff -sq <(perl xml_diff_prep.pl a.xml) <(perl xml_diff_prep.pl b.xml)
    Files /proc/self/fd/11 and /proc/self/fd/12 are identical
    

    (I used the -s and -q flags to be explicit. You can use gvimdiff or whatever other utility or flags you like. Note it identifies the files by file descriptor; that's because I used a bash trick to run the preprocessor command on each input. They'll be in the same order you specified. Note that the contents may be in unexpected locations due to the sorting requested by this question.)

    To satisfy your "Open Source" "command line tool" request, I hereby release this code as Open Source under the Beerware License (BSD 2-clause, if you think it's worthwhile, you are welcome to buy me a beer).

提交回复
热议问题