scrapy run spider from script

后端未结

关注

 4  907

野的像风 2020-12-09 03:53

I want to run my spider from a script rather than a scrap crawl

I found this page

http://doc.scrapy.org/en/latest/topics/practices.html

4条回答

星月不相逢 (楼主)

2020-12-09 04:20

You can just create a normal Python script, and then use Scrapy's command line option runspider, that allows you to run a spider without having to create a project.

For example, you can create a single file stackoverflow_spider.py with something like this:

import scrapy

class QuestionItem(scrapy.item.Item):
    idx = scrapy.item.Field()
    title = scrapy.item.Field()

class StackoverflowSpider(scrapy.spider.Spider):
    name = 'SO'
    start_urls = ['http://stackoverflow.com']
    def parse(self, response):
        sel = scrapy.selector.Selector(response)
        questions = sel.css('#question-mini-list .question-summary')
        for i, elem in enumerate(questions):
            l = scrapy.contrib.loader.ItemLoader(QuestionItem(), elem)
            l.add_value('idx', i)
            l.add_xpath('title', ".//h3/a/text()")
            yield l.load_item()

Then, provided you have scrapy properly installed, you can run it using:

scrapy runspider stackoverflow_spider.py -t json -o questions-items.json

0 讨论(0)

查看其它4个回答