Difference between BaseSpider and CrawlSpider

问题

I have been trying to understand the concept of using BaseSpider and CrawlSpider in web scrapping. I have read the docs. But there is no mention on BaseSpider. It would be really helpful to me if someone explain the differences between BaseSpider and CrawlSpider.

回答1:

BaseSpider is something existed before and now is deprecated (since 0.22) - use scrapy.Spider instead:

import scrapy

class MySpider(scrapy.Spider):
    # ...

scrapy.Spider is the simplest spider that would, basically, visit the URLs defined in start_urls or returned by start_requests().

Use CrawlSpider when you need a "crawling" behavior - extracting the links and following them:

This is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules. It may not be the best suited for your particular web sites or project, but it’s generic enough for several cases, so you can start from it and override it as needed for more custom functionality, or just implement your own spider.

来源：https://stackoverflow.com/questions/32632001/difference-between-basespider-and-crawlspider

标签

python

python-2.7

web-scraping

scrapy

scrapy-spider

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!