Is it possible to crawl multiple start_urls list simultaneously

痴心易碎 提交于 2020-01-06 17:57:46

问题


I have 3 URL files all of them have same structure so same spider can be used for all lists. A special need is that all three need to be crawled simultaneously.

is it possible to crawl them simultaneously without creating multiple spiders?

I believe this answer

start_urls = ["http://example.com/category/top/page-%d/" % i for i in xrange(4)] + \
["http://example.com/superurl/top/page-%d/" % i for i in xrange(55)]

in Scrap multiple urls with scrapy only joins two list, but not to run them at the same time.

Thanks very much


回答1:


use start_requests instead of start_urls ... this will work for u

class MySpider(scrapy.Spider):
name = 'myspider'

def start_requests(self):
    for page in range(1,20):
        yield self.make_requests_from_url('https://www.example.com/page-%s' %page)


来源:https://stackoverflow.com/questions/32435776/is-it-possible-to-crawl-multiple-start-urls-list-simultaneously

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!