How to run several versions of one single spider at one time with Scrapy?

后端 未结 2 1512
清酒与你
清酒与你 2020-12-22 00:54

My problematic is the following:

To win time, I would like to run several versions of one single spider. The process (parsing definitions) is the sa

2条回答
  •  情歌与酒
    2020-12-22 01:27

    So, I find a solution inspired of the scrapy crawl -a variable=value

    The spider concerned, in "spiders" folder was transformed:

    class MySpider(scrapy.Spider):
    name = "arg"
    allowed_domains = ['www.website.com']
    
        def __init__ (self, lo_lim=None, up_lim=None , type_of_race = None) : #lo_lim = 2017 , up_lim = 2019, type_of_race = pmu
            year  = range(int(lo_lim), int(up_lim)) # lower limit, upper limit, must be convert to integer type, instead this is string type
            month = range(1,13) #12 months
            day   = range(1,32) #31 days
            url   = []
            for y in year:
                for m in month:
                    for d in day:
                        url.append("https://www.website.com/details/{}-{}-{}/{}/meeting".format(y,m,d,type_of_race))
    
            self.start_urls = url #where url = ["https://www.website.com/details/2017-1-1/pmu/meeting",
                                            #"https://www.website.com/details/2017-1-2/pmu/meeting",
                                            #...
                                            #"https://www.website.com/details/2017-12-31/pmu/meeting"
                                            #"https://www.website.com/details/2018-1-1/pmu/meeting",
                                            #"https://www.website.com/details/2018-1-2/pmu/meeting",
                                            #...
                                            #"https://www.website.com/details/2018-12-31/pmu/meeting"]
    
        def parse(self, response):
            ...`
    

    Then, it answers to my problematic: to keep one single spider, and to run several versions of it by serveral commands at one time without trouble.

    Without a def __init__ it didn't work for me. I tried a lot of ways, that is this perfectible code that works for me.

    Scrapy version: 1.5.0, Python version: 2.7.9, Mongodb version: 3.6.4, Pymongo version: 3.6.1

提交回复
热议问题