Python Requests - Use navigate site by servers IP

后端 未结 4 2049
生来不讨喜
生来不讨喜 2021-01-03 02:20

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won\'t bother me.

How can I utilize this in the re

4条回答
  •  时光取名叫无心
    2021-01-03 03:15

    I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.

    Otherwise, you will see various issue like InsecureRequestWarning, SSLCertVerificationError, and SNI is always missing in Client Hello, even if you try different combination of headers and verify arguments.

    requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)

    In addition, I tried

    requests_toolbelt

    pip install requests[security]

    forcediphttpsadapter

    all solutions mentioned here using requests with TLS doesn't give SNI support

    None of them set SNI when hitting https://IP directly.

    # mock /etc/hosts
    # lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
    etc_hosts = {}
    
    
    # decorate python built-in resolver
    def custom_resolver(builtin_resolver):
        def wrapper(*args, **kwargs):
            try:
                return etc_hosts[args[:2]]
            except KeyError:
                # fall back to builtin_resolver for endpoints not in etc_hosts
                return builtin_resolver(*args, **kwargs)
    
        return wrapper
    
    
    # monkey patching
    socket.getaddrinfo = custom_resolver(socket.getaddrinfo)
    
    
    def _bind_ip(domain_name, port, ip):
        '''
        resolve (domain_name,port) to a given ip
        '''
        key = (domain_name, port)
        # (family, type, proto, canonname, sockaddr)
        value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
        etc_hosts[key] = [value]
    
    
    _bind_ip('example.com', 443, '1.2.3.4')
    # this sends requests to 1.2.3.4
    response = requests.get('https://www.example.com/foo.php', verify=True)
    

提交回复
热议问题