Crawling local files with Scrapy without an active project?

大城市里の小女人 提交于 2019-12-11 09:27:59

问题


Is it possible to crawl local files with Scrapy 0.18.4 without having an active project? I've seen this answer and it looks promising, but to use the crawl command you need a project.

Alternatively, is there an easy/minimalist way to set up a project for an existing spider? I have my spider, pipelines, middleware, and items defined in one Python file. I've created a scrapy.cfg file with only the project name. This lets me use crawl, but since I don't have a spiders folder Scrapy can't find my spider. Can I point Scrapy to the right directory, or do I need to split my items, spider, etc. up into separate files?

[edit] I forgot to say that I'm running the spider using Crawler.crawl(my_spider) - ideally I'd still like to be able to run the spider like that, but can run it in a subprocess from my script if that's not possible.

Turns out the suggestion in the answer I linked does work - http://localhost:8000 can be used as a start_url, so there's no need for a project.


回答1:


As an option, you can run Scrapy from a script, here is a self-contained example script and the overview of the approach used.

This doesn't mean you have to put everything in one file. You can still have spider.py, items.py, pipelines.py - just import them correctly in the script you start crawling from.



来源:https://stackoverflow.com/questions/27954677/crawling-local-files-with-scrapy-without-an-active-project

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!