how to filter duplicate requests based on url in scrapy

前端 未结 5 944
挽巷
挽巷 2020-11-29 18:40

I am writing a crawler for a website using scrapy with CrawlSpider.

Scrapy provides an in-built duplicate-request filter which filters duplicate requests based on ur

5条回答
  •  旧时难觅i
    2020-11-29 18:47

    In the latest scrapy, we can use the default duplication filter or extend and have custom one.

    define the below config in spider settings

    DUPEFILTER_CLASS = 'scrapy.dupefilters.BaseDupeFilter'

提交回复
热议问题