Performant parsing of HTML pages with Node.js and XPath

前端未结

关注

 6  2123

情书的邮戳 2020-12-07 21:10

I\'m into some web scraping with Node.js. I\'d like to use XPath as I can generate it semi-automatically with several sorts of GUI. The problem is that I cannot find a way t

6条回答

轮回少年 (楼主)

2020-12-07 21:39
I think Osmosis is what you're looking for.
- Uses native libxml C bindings
- Supports CSS 3.0 and XPath 1.0 selector hybrids
- Sizzle selectors, Slick selectors, and more
- No large dependencies like jQuery, cheerio, or jsdom
- HTML parser features
  - Fast parsing
  - Very fast searching
  - Small memory footprint
- HTML DOM features
  - Load and search ajax content
  - DOM interaction and events
  - Execute embedded and remote scripts
  - Execute code in the DOM
Here's an example:
```
osmosis.get(url)
    .find('//div[@class]/ul[2]/li')
    .then(function () {
        count++;
    })
    .done(function () {
        assert.ok(count == 2);
        assert.done();
    });
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...