Avoid Duplicate URL Crawling

后端 未结 3 719
灰色年华
灰色年华 2020-12-13 15:47

I coded a simple crawler. In the settings.py file, by referring to scrapy documentation, I used

DUPEFILTER_CLASS = \'scrapy.dupefilter.RFPDupeFilter\'
         


        
3条回答
  •  醉话见心
    2020-12-13 16:34

    you can rewrite Scheduler with Redis like scrapy-redis then you can avoid duplicate URL crawling when reruning your project.

提交回复
热议问题