I'd like to tell urllib2.urlopen
(or a custom opener) to use 127.0.0.1
(or ::1
) to resolve addresses. I wouldn't change my /etc/resolv.conf
, however.
One possible solution is to use a tool like dnspython
to query addresses and httplib
to build a custom url opener. I'd prefer telling urlopen
to use a custom nameserver though. Any suggestions?
Looks like name resolution is ultimately handled by socket.create_connection
.
-> urllib2.urlopen -> httplib.HTTPConnection -> socket.create_connection
Though once the "Host:" header has been set, you can resolve the host and pass on the IP address through down to the opener.
I'd suggest that you subclass httplib.HTTPConnection
, and wrap the connect
method to modify self.host
before passing it to socket.create_connection
.
Then subclass HTTPHandler
(and HTTPSHandler
) to replace the http_open
method with one that passes your HTTPConnection
instead of httplib's own to do_open
.
Like this:
import urllib2 import httplib import socket def MyResolver(host): if host == 'news.bbc.co.uk': return '66.102.9.104' # Google IP else: return host class MyHTTPConnection(httplib.HTTPConnection): def connect(self): self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout) class MyHTTPSConnection(httplib.HTTPSConnection): def connect(self): sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout) self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file) class MyHTTPHandler(urllib2.HTTPHandler): def http_open(self,req): return self.do_open(MyHTTPConnection,req) class MyHTTPSHandler(urllib2.HTTPSHandler): def https_open(self,req): return self.do_open(MyHTTPSConnection,req) opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler) urllib2.install_opener(opener) f = urllib2.urlopen('http://news.bbc.co.uk') data = f.read() from lxml import etree doc = etree.HTML(data) >>> print doc.xpath('//title/text()') ['Google']
Obviously there are certificate issues if you use the HTTPS, and you'll need to fill out MyResolver...
Another (dirty) way is monkey-patching socket.getaddrinfo
.
For example this code adds a (unlimited) cache for dns lookups.
import socket prv_getaddrinfo = socket.getaddrinfo dns_cache = {} # or a weakref.WeakValueDictionary() def new_getaddrinfo(*args): try: return dns_cache[args] except KeyError: res = prv_getaddrinfo(*args) dns_cache[args] = res return res socket.getaddrinfo = new_getaddrinfo
You will need to implement your own dns lookup client (or using dnspython as you said). The name lookup procedure in glibc is pretty complex to ensure compatibility with other non-dns name systems. There's for example no way to specify a particular DNS server in the glibc library at all.