I want to run my spider from a script rather than a scrap crawl
I found this page
http://doc.scrapy.org/en/latest/topics/practices.html
You can just create a normal Python script, and then use Scrapy's command line option runspider, that allows you to run a spider without having to create a project.
For example, you can create a single file stackoverflow_spider.py with something like this:
import scrapy
class QuestionItem(scrapy.item.Item):
idx = scrapy.item.Field()
title = scrapy.item.Field()
class StackoverflowSpider(scrapy.spider.Spider):
name = 'SO'
start_urls = ['http://stackoverflow.com']
def parse(self, response):
sel = scrapy.selector.Selector(response)
questions = sel.css('#question-mini-list .question-summary')
for i, elem in enumerate(questions):
l = scrapy.contrib.loader.ItemLoader(QuestionItem(), elem)
l.add_value('idx', i)
l.add_xpath('title', ".//h3/a/text()")
yield l.load_item()
Then, provided you have scrapy properly installed, you can run it using:
scrapy runspider stackoverflow_spider.py -t json -o questions-items.json