Following links, Scrapy web crawler framework

前端未结

关注

 2  1481

情书的邮戳 2020-12-13 03:21

After several readings to Scrapy docs I\'m still not catching the diferrence between using CrawlSpider rules and implementing my own link extraction mechanism on the callbac

2条回答

鱼传尺愫 (楼主)

2020-12-13 03:53

If you want selective crawling, like fetching "Next" links for pagination etc., it's better to write your own crawler. But for general crawling, you should use crawlspider and filter out the links that you don't need to follow using Rules & process_links function.

Take a look at the crawlspider code in \scrapy\contrib\spiders\crawl.py , it isn't too complicated.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...