Python Requests - Use navigate site by servers IP

后端未结

关注

 4  2046

生来不讨喜 2021-01-03 02:20

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won\'t bother me.

How can I utilize this in the re

4条回答

没有蜡笔的小新 (楼主)

2021-01-03 02:59

You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:

requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})

The URL 'patching' can be done with the urlparse library:

parsed = urlparse.urlparse(url)
hostname = parsed.hostname
parsed = parsed._replace(netloc=ipaddress)
ip_url = parsed.geturl()

response = requests.get(ip_url, headers={'Host': hostname})

Demo against Stack Overflow:

>>> import urlparse
>>> import socket
>>> url = 'http://stackoverflow.com/help/privileges'
>>> parsed = urlparse.urlparse(url)
>>> hostname = parsed.hostname
>>> hostname
'stackoverflow.com'
>>> ipaddress = socket.gethostbyname(hostname)
>>> ipaddress
'198.252.206.16'
>>> parsed = parsed._replace(netloc=ipaddress)
>>> ip_url = parsed.geturl()
>>> ip_url
'http://198.252.206.16/help/privileges'
>>> response = requests.get(ip_url, headers={'Host': hostname})
>>> response

In this case I looked up the ip address dynamically.

0 讨论(0)

查看其它4个回答