I\'m into some web scraping with Node.js. I\'d like to use XPath as I can generate it semi-automatically with several sorts of GUI. The problem is that I cannot find a way t
I think Osmosis is what you're looking for.
- Uses native libxml C bindings
- Supports CSS 3.0 and XPath 1.0 selector hybrids
- Sizzle selectors, Slick selectors, and more
- No large dependencies like jQuery, cheerio, or jsdom
HTML parser features
- Fast parsing
- Very fast searching
- Small memory footprint
HTML DOM features
- Load and search ajax content
- DOM interaction and events
- Execute embedded and remote scripts
- Execute code in the DOM
Here's an example:
osmosis.get(url)
.find('//div[@class]/ul[2]/li')
.then(function () {
count++;
})
.done(function () {
assert.ok(count == 2);
assert.done();
});