How to change request url before making request in scrapy?

荒凉一梦 提交于 2020-05-14 19:56:07

问题


I need to modify my request url before a response is downloaded. But I'm not able to change it. Even after modifying the request url using request.replace(url=new_url), the process_response prints the non-modified url. Here's the code of the middleware:

def process_request(self, request, spider):
    original_url = request.url
    new_url= original_url + "hello%20world"
    print request.url            # This prints the original request url
    request=request.replace(url=new_url)
    print request.url            # This prints the modified url

def process_response(self, request, response, spider):
    print request.url            # This prints the original request url
    print response.url           # This prints the original request url
    return response

Can anyone please tell me what I'm missing here ?


回答1:


Since you are modifying the request object in process_request() - you need to return it:

def process_request(self, request, spider): 
    # avoid infinite loop by not processing the URL if it contains the desired part
    if "hello%20world" in request.url: pass 

    new_url = request.url + "hello%20world"
    request = request.replace(url=new_url) 
    return request


来源:https://stackoverflow.com/questions/34437204/how-to-change-request-url-before-making-request-in-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!