Python Requests - Use navigate site by servers IP

后端 未结 4 2046
生来不讨喜
生来不讨喜 2021-01-03 02:20

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won\'t bother me.

How can I utilize this in the re

4条回答
  •  没有蜡笔的小新
    2021-01-03 02:59

    You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:

    requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})
    

    The URL 'patching' can be done with the urlparse library:

    parsed = urlparse.urlparse(url)
    hostname = parsed.hostname
    parsed = parsed._replace(netloc=ipaddress)
    ip_url = parsed.geturl()
    
    response = requests.get(ip_url, headers={'Host': hostname})
    

    Demo against Stack Overflow:

    >>> import urlparse
    >>> import socket
    >>> url = 'http://stackoverflow.com/help/privileges'
    >>> parsed = urlparse.urlparse(url)
    >>> hostname = parsed.hostname
    >>> hostname
    'stackoverflow.com'
    >>> ipaddress = socket.gethostbyname(hostname)
    >>> ipaddress
    '198.252.206.16'
    >>> parsed = parsed._replace(netloc=ipaddress)
    >>> ip_url = parsed.geturl()
    >>> ip_url
    'http://198.252.206.16/help/privileges'
    >>> response = requests.get(ip_url, headers={'Host': hostname})
    >>> response
    
    

    In this case I looked up the ip address dynamically.

提交回复
热议问题