Algorithm for identifying differences in XML documents

给你一囗甜甜゛ 提交于 2019-12-11 04:01:47

问题


I'm trying to create a program in Java that takes two XML files (one is an updated version of the other) and takes them into main memory. It will then compare the files and count the number of differences between each corresponding node from the two (excluding white space). Later on the program will do more with the differences but I'm just confused on how to start comparing nodes from two separate files. Any suggestions would be much appreciated.


回答1:


My first suggestion is that you could use XMLUnit:

Reader expected=new FileReader(...);
Reader tested=new FileReader(...);
Diff diff=XMLUnit.compareXML(expected, tested);



回答2:


For an algorithm that computes signatures (hashes) at each node to facilitate comparison, see Detecting Changes in XML Documents.

For change detection on XML documents where element ordering is insignificant, see X-Diff: An Effective Change Detection Algorithm for XML Documents. Java and C++ implementations of the X-Diff algorithm are available.




回答3:


It depends if you have differences of nodes, or differences inside nodes.

This code extract all nodes, and their paths, and value inside

Assuming, you have two xml Documents:

XPath xPath = XPathFactory.newInstance().newXPath();
//Every nodes
expression="//*";
NodeList nodes  = (NodeList)  xPath.compile(expression).evaluate(document, XPathConstants.NODESET);

// iterate them all
for(int i=0; i<nodes.getLength(); i++)
{
 Node the_node = nodes.item(i);

 if(the_node instanceof Element)
    {
     Element the_element=(Element) the_node;

    // PATH 
    String path ="";
    Node noderec = the_node; 
    while( noderec  != null) 
        {
        if (path.equals("")) path = noderec.getNodeName();
        else
       path = noderec.getNodeName() + '/' + path;
       noderec = noderec.getParentNode();

       if (noderec==document){path="//"+path; noderec=null;}
       }
      System.out.println( "PATH:"+path );
     System.out.println("CONTENT="+the_element.getTextContent());
    }
}

PATH : gives you the path

CONTENT: sub content of the node

With that, you get all the pathes of your xml: you can compare one by one, sort, and use others algorithms to find if something is inserted, ...

And inside each node, you can make another comparisons.

Hope it helps



来源:https://stackoverflow.com/questions/31439859/algorithm-for-identifying-differences-in-xml-documents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!