Im using scrapy to crawl a news website on a daily basis. How do i restrict scrapy from scraping already scraped URLs. Also is there any clear documentation or examples on
Scrapy can auto-filter urls which are scraped, isn't it? Some different urls point to the same page will not be filtered, such as "www.xxx.com/home/" and "www.xxx.com/home/index.html".