Pass Scrapy Spider a list of URLs to crawl via .txt file

前端 未结 4 667
无人及你
无人及你 2020-12-24 11:16

I\'m a little new to Python and very new to Scrapy.

I\'ve set up a spider to crawl and extract all the information I need. However, I need to pass a .txt file of U

4条回答
  •  心在旅途
    2020-12-24 12:06

    class MySpider(scrapy.Spider):
        name = 'nameofspider'
    
        def __init__(self, filename=None):
            if filename:
                with open('your_file.txt') as f:
                    self.start_urls = [url.strip() for url in f.readlines()]
    

    This will be your code. It will pick up the urls from the .txt file if they are separated by lines, like, url1 url2 etc..

    After this run the command -->

    scrapy crawl nameofspider -a filename=filename.txt
    

    Lets say, your filename is 'file.txt', then, run the command -->

    scrapy crawl myspider -a filename=file.txt
    

提交回复
热议问题