scrapy spider not found

北战南征 提交于 2021-02-06 14:05:02

问题


I'm trying to reproduce the code of this talk:

https://www.youtube.com/watch?v=eD8XVXLlUTE

When I try to run the spider:

scrapy crawl talkspider_basic

I got this error:

raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: talkspider_basic'

The code of the spider is:

from scrapy.spiders import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import  SgmlLinkExtractor
from scrapy.contrib.loader import XPathItemLoader
from pytexas.items import  PytexasItem

class TalkspiderBasicSpider(BaseSpider):
    name = "talkspider_basic"
    allowed_domains = ["www.pytexas.org"]
    start_urls = ['http://wwww.pytexas.org/2013/schedule']

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        dls = hcs.select('///dl')
        for dl in dls:
            times = dl.select('dt/text()').extract()
            titles = dl.select('dd/a/text()').extract()
            for time, title in zip(times,titles):
                title = title.strip()
                yield PytexasItem(title=title,time= time)

The code of the Items is:

from scrapy.item import Item, Field

class PytexasItem(Item):
    title = Field()
    time = Field()
    speaker = Field()
    description = Field()

The name of the project and of the spider's file are

pytexas

and

talk_spider_basic.py

respectively, so I don't think that there is any conflict because of the names.

Edit:

It has the default structure:

pytexas/     
  scrapy.cfg    
  pytexas/    
    items.py   
    pipelines.py   
    settings.py   
    spiders/   
      __init__.py   
      talk_spider_basic.py    

回答1:


According Github Issues #2254. Because some module is deprecated.Like scrapy.contrib.

So you should make some change.

From:

from scrapy.contrib.linkextractors.sgml import  SgmlLinkExtractor
from scrapy.contrib.loader import XPathItemLoader

To:

from scrapy.linkextractors import LinkExtractor
from scrapy.loader import XPathItemLoader



回答2:


One solution, which works in some situation, is downgrade your scrapy (if it is >=1.3). To do this you can run the following command:

pip install scrapy==1.2




回答3:


I know that this post may be old. But I have found another problem, which may produce error "spider not found". I have my spiders organized in folders, e.g <crawler-project>/spiders/full, <crawler-project>/spiders/clean . So I created new directory - <crawler-project>/spiders/aaa - in which I placed new spider. This new spider was not found by scrapy, until I created __init__.py file. So if you want to organize spiders in folders, you should create valid python module folders.



来源:https://stackoverflow.com/questions/38627000/scrapy-spider-not-found

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!