Performant parsing of HTML pages with Node.js and XPath

前端 未结 6 2123
情书的邮戳
情书的邮戳 2020-12-07 21:10

I\'m into some web scraping with Node.js. I\'d like to use XPath as I can generate it semi-automatically with several sorts of GUI. The problem is that I cannot find a way t

6条回答
  •  轮回少年
    2020-12-07 21:39

    I think Osmosis is what you're looking for.

    • Uses native libxml C bindings
    • Supports CSS 3.0 and XPath 1.0 selector hybrids
    • Sizzle selectors, Slick selectors, and more
    • No large dependencies like jQuery, cheerio, or jsdom
    • HTML parser features

      • Fast parsing
      • Very fast searching
      • Small memory footprint
    • HTML DOM features

      • Load and search ajax content
      • DOM interaction and events
      • Execute embedded and remote scripts
      • Execute code in the DOM

    Here's an example:

    osmosis.get(url)
        .find('//div[@class]/ul[2]/li')
        .then(function () {
            count++;
        })
        .done(function () {
            assert.ok(count == 2);
            assert.done();
        });
    

提交回复
热议问题