Force python mechanize/urllib2 to only use A requests?

后端 未结 4 1920
长发绾君心
长发绾君心 2020-12-14 04:29

Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests

Basica

相关标签:
4条回答
  • 2020-12-14 05:10

    Most likely cause of this is a broken egress firewall. Juniper firewalls can cause this, for instance, though they have a workaround available.

    If you can't get your network admins to fix the firewall, you can try the host-based workaround. Add this line to your /etc/resolv.conf:

    options single-request-reopen
    

    The man page explains it well:

    The resolver uses the same socket for the A and AAAA requests. Some hardware mistakenly only sends back one reply. When that happens the client sytem will sit and wait for the second reply. Turning this option on changes this behavior so that if two requests from the same port are not handled correctly it will close the socket and open a new one before sending the second request.

    0 讨论(0)
  • 2020-12-14 05:12

    Suffering from the same problem, here is an ugly hack (use at your own risk..) based on the information given by J.J. .

    This basically forces the family parameter of socket.getaddrinfo(..) to socket.AF_INET instead of using socket.AF_UNSPEC (zero, which is what seems to be used in socket.create_connection), not only for calls from urllib2 but should do it for all calls to socket.getaddrinfo(..):

    #--------------------
    # do this once at program startup
    #--------------------
    import socket
    origGetAddrInfo = socket.getaddrinfo
    
    def getAddrInfoWrapper(host, port, family=0, socktype=0, proto=0, flags=0):
        return origGetAddrInfo(host, port, socket.AF_INET, socktype, proto, flags)
    
    # replace the original socket.getaddrinfo by our version
    socket.getaddrinfo = getAddrInfoWrapper
    
    #--------------------
    import urllib2
    
    print urllib2.urlopen("http://python.org/").read(100)
    

    This works for me at least in this simple case.

    0 讨论(0)
  • 2020-12-14 05:20

    No answer, but a few datapoints. The DNS resolution appears to be originating from httplib.py in HTTPConnection.connect() (line 670 on my python 2.5.4 stdlib)

    The code flow is roughly:

    for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        self.sock = socket.socket(af, socktype, proto)
        try:
            self.sock.connect(sa)
        except socket.error, msg: 
            continue
        break
    

    A few comments on what's going on:

    • the third argument to socket.getaddrinfo() limits the socket families -- i.e., IPv4 vs. IPv6. Passing zero returns all families. Zero is hardcoded into the stdlib.

    • passing a hostname into getaddrinfo() will cause name resolution -- on my OS X box with IPv6 enabled, both A and AAAA records go out, both answers come right back and both are returned.

    • the rest of the connect loop tries each returned address until one succeeds

    For example:

    >>> socket.getaddrinfo("python.org", 80, 0, socket.SOCK_STREAM)
    [
     (30, 1, 6, '', ('2001:888:2000:d::a2', 80, 0, 0)), 
     ( 2, 1, 6, '', ('82.94.164.162', 80))
    ]
    >>> help(socket.getaddrinfo)
    getaddrinfo(...)
        getaddrinfo(host, port [, family, socktype, proto, flags])
            -> list of (family, socktype, proto, canonname, sockaddr)
    

    Some guesses:

    • Since the socket family in getaddrinfo() is hardcoded to zero, you won't be able to override the A vs. AAAA records through some supported API interface in urllib. Unless mechanize does their own name resolution for some other reason, mechanize can't either. From the construct of the connect loop, this is By Design.

    • python's socket module is a thin wrapper around the POSIX socket APIs; I expect they're resolving every family available & configured on the system. Double-check Gentoo's IPv6 configuration.

    0 讨论(0)
  • 2020-12-14 05:22

    The DNS server 8.8.8.8 (Google DNS) replies immediately when asked about the AAAA of python.org. Therefore, the fact we do not see this reply in the trace you post probably indicate that this packet did not come back (which happens with UDP). If this loss is random, it is normal. If it is systematic, it means there is a problem in your network setup, may be a broken firewall which prevents the first AAAA reply to come back.

    The 5-second delay comes from your stub resolver. In that case, if it is random, it is probably bad luck, but not related to IPv6, the reply for the A record could have failed as well.

    Disabling IPv6 seems a very strange move, only two years before the last IPv4 address is distributed!

    % dig @8.8.8.8  AAAA python.org
    
    ; <<>> DiG 9.5.1-P3 <<>> @8.8.8.8 AAAA python.org
    ; (1 server found)
    ;; global options:  printcmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50323
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 512
    ;; QUESTION SECTION:
    ;python.org.                    IN      AAAA
    
    ;; ANSWER SECTION:
    python.org.             69917   IN      AAAA    2001:888:2000:d::a2
    
    ;; Query time: 36 msec
    ;; SERVER: 8.8.8.8#53(8.8.8.8)
    ;; WHEN: Sat Jan  9 21:51:14 2010
    ;; MSG SIZE  rcvd: 67
    
    0 讨论(0)
提交回复
热议问题