Can i execute scrapy(python) crawl outside the project dir?

放肆的年华 提交于 2019-12-05 09:18:55

问题


The docs says i could only execute the crawl command inside the project dir :

scrapy crawl tutor -o items.json -t json

but i really need to execute it in my python code (the python file is not inside current project dir)

Is there any approach fit my requirement ?

My project tree:

.
├── etao
│   ├── etao
│   │   ├── __init__.py
│   │   ├── items.py
│   │   ├── pipelines.py
│   │   ├── settings.py
│   │   └── spiders
│   │       ├── __init__.py
│   │       ├── etao_spider.py
│   ├── items.json
│   ├── scrapy.cfg
│   └── start.py
└── start.py    <-------------- I want to execute the script here.

Any here's my code followed this link but it doesn't work:

#!/usr/bin/env python
import os
#Must be at the top before other imports
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings')

from scrapy import project
from scrapy.conf import settings
from scrapy.crawler import CrawlerProcess

class CrawlerScript():

  def __init__(self):
    self.crawler = CrawlerProcess(settings)
    if not hasattr(project, 'crawler'):
      self.crawler.install()
    self.crawler.configure()

  def crawl(self, spider_name):
    spider = self.crawler.spiders.create(spider_name)   <--- line 19
    if spider:
      self.crawler.queue.append_spider(spider)
    self.crawler.start()
    self.crawler.stop()


# main
if __name__ == '__main__':
  crawler = CrawlerScript()
  crawler.crawl('etao')

the error is:

line 19: KeyError: 'Spider not found: etao'

回答1:


you can actually call the crawlprocess yourself...

its something like

from scrapy.crawler import CrawlerProcess
from scrapy.conf import settings


settings.overrides.update({}) # your settings

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

crawlerProcess.crawl(spider) # your spider here

Credits to @warwaruk.



来源:https://stackoverflow.com/questions/9530046/can-i-execute-scrapypython-crawl-outside-the-project-dir

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!