I\'m into some web scraping with Node.js. I\'d like to use XPath as I can generate it semi-automatically with several sorts of GUI. The problem is that I cannot find a way t
Libxmljs is currently the fastest implementation (something like a benchmark) since it's only bindings to the LibXML C-library which supports XPath 1.0 queries:
var libxmljs = require("libxmljs");
var xmlDoc = libxmljs.parseXml(xml);
// xpath queries
var gchild = xmlDoc.get('//grandchild');
However, you need to sanitize your HTML first and convert it to proper XML. For that you could either use the HTMLTidy command line utility (tidy -q -asxml input.html), or if you want it to keep node-only, something like xmlserializer should do the trick.