Python Requests: requests.exceptions.TooManyRedirects: Exceeded 30 redirects

后端 未结 3 1673
孤城傲影
孤城傲影 2020-12-03 14:32

I was trying to crawl this page using python-requests library

import requests
from lxml import etree,html

url = \'http://www.amazon.in/b/ref=sa_menu_mobile_         


        
相关标签:
3条回答
  • 2020-12-03 14:46

    Increase of max_redirect is possible by explicitly specifying the count as in example below:

    session = requests.Session()
    session.max_redirects = 60
    session.get('http://www.amazon.com')
    
    0 讨论(0)
  • 2020-12-03 14:59

    You need to copy the cookie value to you header. It works on my end.

    0 讨论(0)
  • 2020-12-03 15:00

    Amazon is redirecting your request to http://www.amazon.in/b?ie=UTF8&node=976419031, which in turn redirects to http://www.amazon.in/electronics/b?ie=UTF8&node=976419031, after which you have entered a loop:

    >>> loc = url
    >>> seen = set()
    >>> while True:
    ...     r = requests.get(loc, allow_redirects=False)
    ...     loc = r.headers['location']
    ...     if loc in seen: break
    ...     seen.add(loc)
    ...     print loc
    ... 
    http://www.amazon.in/b?ie=UTF8&node=976419031
    http://www.amazon.in/electronics/b?ie=UTF8&node=976419031
    >>> loc
    http://www.amazon.in/b?ie=UTF8&node=976419031
    

    So your original URL A redirects no a new URL B, which redirects to C, which redirects to B, etc.

    Apparently Amazon does this based on the User-Agent header, at which point it sets a cookie that following requests should send back. The following works:

    >>> s = requests.Session()
    >>> s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'
    >>> r = s.get(url)
    >>> r
    <Response [200]>
    

    This created a session (for ease of re-use and for cookie persistence), and a copy of the Chrome user agent string. The request succeeds (returns a 200 response).

    0 讨论(0)
提交回复
热议问题