Scrapy - how to identify already scraped urls

后端未结

关注

 5  1784

南笙 2020-12-05 08:28

Im using scrapy to crawl a news website on a daily basis. How do i restrict scrapy from scraping already scraped URLs. Also is there any clear documentation or examples on

5条回答

既然无缘 (楼主)

2020-12-05 08:56

Scrapy can auto-filter urls which are scraped, isn't it? Some different urls point to the same page will not be filtered, such as "www.xxx.com/home/" and "www.xxx.com/home/index.html".

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...