Scrapy request not passing to callback when 301?

百般思念 提交于 2019-12-21 22:24:47

问题


I'm trying to update a database full of links to external websites, for some reason, it's skipping the callback when the request headers/website/w/e is moved/301 flag

def start_requests(self): 

    #... database stuff

    for x in xrange(0, numrows):
        row = cur.fetchone()

        item = exampleItem()

        item['real_id'] = row[0]
        item['product_id'] = row[1]
        url = "http://www.example.com/a/-" + item['real_id'] + ".htm"
        log.msg("item %d request URL is %s" % (item['product_id'], url), log.INFO) # shows right
        request = scrapy.Request(url, callback=self.parse_url)
        request.meta['item'] = item
        yield request

def parse_url(self, response):
    item = response.meta['item']
    item['real_url'] = response.url
    log.msg("item %d new URL is %s" % (item['product_id'], item['real_url']), log.INFO) #doesnt even show the items that have redirected.

Scrapy version is 0.24, what can I do?

Interesting fact: It only happens with some of the broken links, even if they are from the same website with the exact same urls, etc.


回答1:


Had to pass the dont_filter=True parameter to the Response callback function



来源:https://stackoverflow.com/questions/31776048/scrapy-request-not-passing-to-callback-when-301

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!