scrapy crawl [spider-name] fault

百般思念 提交于 2019-12-01 11:46:24

问题


Hi guys i am building a web scraping project using scrapy framework and python. In spider folder of my project i have two spiders named spider1 and spider2

spider1.py

class spider(BaseSpider):
    name= "spider1"
    ........
    ........

spider2.py

class spider(BaseSpider):
    name="spider2"
    ............
    ...........

settings.py

SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = ['project_name.spiders']
ITEM_PIPELINES = ['project_name.pipelines.spider']

Now when i write the command scrapy crawl spider1 in my root project folder it calls spider2.py instead of spider1.py. when i will delete spider2.py from my project then it calls spider1.py

Earlier 1 day back its working fine for 1 month but suddenly what happens i can't figure it out please help me guys


回答1:


I tackled the same problem, however removing all *.pyc files from everywhere in my project did the job.

Especially I think settings.pyc is important to remove.

Hope that helps.




回答2:


Building on Nomad's answer. You can avoid the creation of all but one pyc file during development by adding:

import sys
sys.dont_write_bytecode = True

to the project's "__init__.py" file.

This will prevent .pyc files from being created. Especially useful if you are working on a project and you rename the file name of a spider. Prevents the cached pyc of the old spiders remaining, and a few other gotchas.



来源:https://stackoverflow.com/questions/17992051/scrapy-crawl-spider-name-fault

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!