Scrapy - how to identify already scraped urls

后端 未结 5 1784
南笙
南笙 2020-12-05 08:28

Im using scrapy to crawl a news website on a daily basis. How do i restrict scrapy from scraping already scraped URLs. Also is there any clear documentation or examples on

5条回答
  •  既然无缘
    2020-12-05 08:56

    Scrapy can auto-filter urls which are scraped, isn't it? Some different urls point to the same page will not be filtered, such as "www.xxx.com/home/" and "www.xxx.com/home/index.html".

提交回复
热议问题