I\'m interested in advice/pseudocode code/explanation rather than actual implementation.
private static void buildEntryList( List entries, String parentXPath, Element parent ) {
NamedNodeMap attrs = parent.getAttributes();
for( int i = 0; i < attrs.getLength(); i++ ) {
Attr attr = (Attr)attrs.item( i );
//TODO: escape attr value
entries.add( parentXPath+"[@"+attr.getName()+"='"+attr.getValue()+"']");
}
HashMap nameMap = new HashMap();
NodeList children = parent.getChildNodes();
for( int i = 0; i < children.getLength(); i++ ) {
Node child = children.item( i );
if( child instanceof Text ) {
//TODO: escape child value
entries.add( parentXPath+"='"+((Text)child).getData()+"'" );
} else if( child instanceof Element ) {
String childName = child.getNodeName();
Integer nameCount = nameMap.get( childName );
nameCount = nameCount == null ? 1 : nameCount + 1;
nameMap.put( child.getNodeName(), nameCount );
buildEntryList( entries, parentXPath+"/"+childName+"["+nameCount+"]", (Element)child);
}
}
}
public static List getEntryList( Document doc ) {
ArrayList entries = new ArrayList();
Element root = doc.getDocumentElement();
buildEntryList(entries, "/"+root.getNodeName()+"[1]", root );
return entries;
}
This code works with two assumptions: you aren't using namespaces and there are no mixed content elements. The namespace limitation isn't a serious one, but it'd make your XPath expression much harder to read, as every element would be something like *:, but otherwise it's easy to implement. Mixed content on the other hand would make the use of xpath very tedious, as you'd have to be able to individually address the second, third and so on text node within an element.