how to scrape links with phantomjs

前端 未结 3 580
囚心锁ツ
囚心锁ツ 2020-12-24 08:58

Can PhantomJS be used an an alternative to BeautifulSoup?

I am trying to search on Etsy and visit all the links in term. In Python, I know how to do this (with Beau

3条回答
  •  北荒
    北荒 (楼主)
    2020-12-24 09:23

    PhantomJS evaluate() cannot serialize and return complex objects like HTMLElements or NodeLists, so you have to map them to serializable things before:

    var page = require('webpage').create();
    var url = 'http://www.etsy.com/search?q=hello%20kitty';
    
    page.open(url, function(status) {
        // list all the a.href links in the hello kitty etsy page
        var links = page.evaluate(function() {
            return [].map.call(document.querySelectorAll('a.listing-thumb'), function(link) {
                return link.getAttribute('href');
            });
        });
        console.log(links.join('\n'));
        phantom.exit();
    });
    

    Note: here we use [].map.call() in order to treat a NodeList as a standard Array.

提交回复
热议问题