I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the jo
One workaround to speed up your scrapy is to config your start_urls appropriately.
For example, If our target data is in http://apps.webofknowledge.com/doc=1 where the doc number range from 1 to 1000, you can config your start_urls in followings:
start_urls = [
"http://apps.webofknowledge.com/doc=250",
"http://apps.webofknowledge.com/doc=750",
]
In this way, requests will start from 250 to 251,249 and from 750 to 751,749 simultaneously, so you will get 4 times faster compared to start_urls = ["http://apps.webofknowledge.com/doc=1"].