I want to use scrapy for crawling web pages. Is there a way to pass the start URL from the terminal itself?
It is given in the documentation that either the name of
I'm not really sure about the commandline option. However, you could write your spider like this.
class MySpider(BaseSpider):
name = 'my_spider'
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.start_urls = [kwargs.get('start_url')]
And start it like:
scrapy crawl my_spider -a start_url="http://some_url"