How to pass parameter to a scrapy pipeline object

后端 未结 3 791
遇见更好的自我
遇见更好的自我 2021-01-03 05:30

After scraping some data with a Scrapy spider:

class Test_Spider(Spider):

    name = "test"
    def start_requests(self):
        for i in range(90         


        
相关标签:
3条回答
  • 2021-01-03 05:57

    A simpler way to do this is to pass the argument on crawl:

    scrapy crawl -a table=table1
    

    Then get the value with spider.table:

    class TestScrapyPipeline(object):
        def process_item(self, item, spider):
            table = spider.table
    
    0 讨论(0)
  • 2021-01-03 05:58

    Assuming you pass this parameter through the command line (e.g. -s table="table1"), define a from_crawler method.

    @classmethod
    def from_crawler(cls, crawler):
        # Here, you get whatever value was passed through the "table" parameter
        settings = crawler.settings
        table = settings.get('table')
    
        # Instantiate the pipeline with your table
        return cls(table)
    
    def __init__(self, table):
        _engine = create_engine("sqlite:///data.db")
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("detail_url", Text),
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items
    
    0 讨论(0)
  • 2021-01-03 05:58
    class SQLlitePipeline(object):
    
        def __init__(self, table_name):
    
            _engine = create_engine("sqlite:///data.db")
            _connection = _engine.connect()
            _metadata = MetaData()
            _stack_items = Table(table_name, _metadata,
                                 Column("id", Integer, primary_key=True),
                                 Column("detail_url", Text),
            _metadata.create_all(_engine)
            self.connection = _connection
            self.stack_items = _stack_items
    
        @classmethod
        def from_crawler(cls, crawler):
            table_name = getattr(crawler.spider, 'table_name')
            return cls(table_name)
    

    With from_crawler you can create or instantiate a pipeline object with the parameters you specify.

    0 讨论(0)
提交回复
热议问题