How to give URL to scrapy for crawling?

前端 未结 6 711
终归单人心
终归单人心 2020-11-29 01:42

I want to use scrapy for crawling web pages. Is there a way to pass the start URL from the terminal itself?

It is given in the documentation that either the name of

6条回答
  •  感情败类
    2020-11-29 02:15

    Sjaak Trekhaak has the right idea and here is how to allow multiples:

    class MySpider(scrapy.Spider):
        """
        This spider will try to crawl whatever is passed in `start_urls` which
        should be a comma-separated string of fully qualified URIs.
    
        Example: start_urls=http://localhost,http://example.com
        """
        def __init__(self, name=None, **kwargs):
            if 'start_urls' in kwargs:
                self.start_urls = kwargs.pop('start_urls').split(',')
            super(Spider, self).__init__(name, **kwargs)
    

提交回复
热议问题