I am crawling a site which may contain a lot of start_urls, like:
start_urls
http://www.a.com/list_1_2_3.htm
The best way to generate URLs dynamically is to override the start_requests method of the spider:
from scrapy.http.request import Request def start_requests(self): with open('urls.txt', 'rb') as urls: for url in urls: yield Request(url, self.parse)