How to generate the start_urls dynamically in crawling?

后端 未结 2 1978
失恋的感觉
失恋的感觉 2020-12-07 16:08

I am crawling a site which may contain a lot of start_urls, like:

http://www.a.com/list_1_2_3.htm
         


        
2条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-07 16:37

    The best way to generate URLs dynamically is to override the start_requests method of the spider:

    from scrapy.http.request import Request
    
    def start_requests(self):
          with open('urls.txt', 'rb') as urls:
              for url in urls:
                  yield Request(url, self.parse)
    

提交回复
热议问题