scrapy unable to make Request() callback

孤街醉人 提交于 2020-01-04 01:50:50

问题


I am trying to make recursive parsing script with Scrapy, but Request() function doesn't call callback function suppose_to_parse(), nor any function provided in callback value. I tried different variations but none of them work. Where to dig ?

from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector



class joomler(BaseSpider):
    name = "scrapy"
    allowed_domains = ["scrapy.org"]
    start_urls = ["http://blog.scrapy.org/"]


    def parse(self, response):
        print "Working... "+response.url
        hxs = HtmlXPathSelector(response)
        for link in hxs.select('//a/@href').extract():
            if not link.startswith('http://') and not link.startswith('#'):
               url=""
               url=(self.start_urls[0]+link).replace('//','/')
               print url
               yield Request(url, callback=self.suppose_to_parse)


    def suppose_to_parse(self, response):
        print "asdasd"
        print response.url

回答1:


Move the yield outside of the if statement:

for link in hxs.select('//a/@href').extract():
    url = link
    if not link.startswith('http://') and not link.startswith('#'):
        url = (self.start_urls[0] + link).replace('//','/')

    print url
    yield Request(url, callback=self.suppose_to_parse)



回答2:


I'm not an expert but i tried your code and i think that the problem is not on the Request, the generated url's seems to be broken, if you add some url's to a list and iterate thorough them and yield the Request with the callback, it works fine.



来源:https://stackoverflow.com/questions/15578918/scrapy-unable-to-make-request-callback

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!