问题
I'm trying to create a program in Java that takes two XML files (one is an updated version of the other) and takes them into main memory. It will then compare the files and count the number of differences between each corresponding node from the two (excluding white space). Later on the program will do more with the differences but I'm just confused on how to start comparing nodes from two separate files. Any suggestions would be much appreciated.
回答1:
My first suggestion is that you could use XMLUnit:
Reader expected=new FileReader(...);
Reader tested=new FileReader(...);
Diff diff=XMLUnit.compareXML(expected, tested);
回答2:
For an algorithm that computes signatures (hashes) at each node to facilitate comparison, see Detecting Changes in XML Documents.
For change detection on XML documents where element ordering is insignificant, see X-Diff: An Effective Change Detection Algorithm for XML Documents. Java and C++ implementations of the X-Diff algorithm are available.
回答3:
It depends if you have differences of nodes, or differences inside nodes.
This code extract all nodes, and their paths, and value inside
Assuming, you have two xml Documents:
XPath xPath = XPathFactory.newInstance().newXPath();
//Every nodes
expression="//*";
NodeList nodes = (NodeList) xPath.compile(expression).evaluate(document, XPathConstants.NODESET);
// iterate them all
for(int i=0; i<nodes.getLength(); i++)
{
Node the_node = nodes.item(i);
if(the_node instanceof Element)
{
Element the_element=(Element) the_node;
// PATH
String path ="";
Node noderec = the_node;
while( noderec != null)
{
if (path.equals("")) path = noderec.getNodeName();
else
path = noderec.getNodeName() + '/' + path;
noderec = noderec.getParentNode();
if (noderec==document){path="//"+path; noderec=null;}
}
System.out.println( "PATH:"+path );
System.out.println("CONTENT="+the_element.getTextContent());
}
}
PATH : gives you the path
CONTENT: sub content of the node
With that, you get all the pathes of your xml: you can compare one by one, sort, and use others algorithms to find if something is inserted, ...
And inside each node, you can make another comparisons.
Hope it helps
来源:https://stackoverflow.com/questions/31439859/algorithm-for-identifying-differences-in-xml-documents