I want to use scrapy for crawling web pages. Is there a way to pass the start URL from the terminal itself?
It is given in the documentation that either the name of
Sjaak Trekhaak has the right idea and here is how to allow multiples:
class MySpider(scrapy.Spider):
"""
This spider will try to crawl whatever is passed in `start_urls` which
should be a comma-separated string of fully qualified URIs.
Example: start_urls=http://localhost,http://example.com
"""
def __init__(self, name=None, **kwargs):
if 'start_urls' in kwargs:
self.start_urls = kwargs.pop('start_urls').split(',')
super(Spider, self).__init__(name, **kwargs)