Avoiding redirection

瘦欲@ 提交于 2019-12-20 03:02:20

问题


I'm trying to parse a site(written in ASP) and the crawler gets redirected to the main site. But what I'd like to do is to parse the given url, not the redirected one. Is there a way to do this?. I tried adding "REDIRECT=False" to the settings.py file without success.

Here's some output from the crawler:

2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=500&id=500>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1513&id=1513>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=476&id=476>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=472&id=472>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=457&id=457>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097>

回答1:


http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097 redirects to http://www.cotodigital.com.ar/default.asp because HTTP response said to so. This happens because asp code is checking for some condition - a wrong page, or cookies, or user-agent, or referrer. Check the mentioned conditions.

UPDATE: Just checked in my browser: the browser is also redirected to the main page, where i click 'Skip ads'. After that it works OK.

This means it sets some cookies, without which it redirects to the main page.

See also Scrapy - how to manage cookies/sessions




回答2:


The original URL has nothing to scrape. It returned 302, meaning there is no body, and the Location header indicates where to redirect to. You need to figure out how to access the URL without being redirected, perhaps by authenticating.



来源:https://stackoverflow.com/questions/7542829/avoiding-redirection

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!