Python urllib over TOR? [duplicate]

爷,独闯天下 提交于 2019-11-27 00:42:48

The problem is that httplib.HTTPConnection uses the socket module's create_connection helper function which does the DNS request via the usual getaddrinfo method before connecting the socket.

The solution is to make your own create_connection function and monkey-patch it into the socket module before importing urllib2, just like we do with the socket class.

import socks
import socket
def create_connection(address, timeout=None, source_address=None):
    sock = socks.socksocket()
    sock.connect(address)
    return sock

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)

# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection

import urllib2

# Now you can go ahead and scrape those shady darknet .onion sites

The problem is that you are importing urllib2 before you set up the socks connection.

Try this instead:

import socks
import socket

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, '127.0.0.1', 9050, True)
socket.socket = socks.socksocket

import urllib2
print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read()

Manual request example:

import socks                                                         
import urlparse                                                      

SOCKS_HOST = 'localhost'                                             
SOCKS_PORT = 9050                                                    
SOCKS_TYPE = socks.PROXY_TYPE_SOCKS5                                 

url = 'http://www.whatismyip.com/automation/n09230945.asp'           
parsed = urlparse.urlparse(url)                                      


socket = socks.socksocket()                                          
socket.setproxy(SOCKS_TYPE, SOCKS_HOST, SOCKS_PORT)                  
socket.connect((parsed.netloc, 80))                                  
socket.send('''GET %(uri)s HTTP/1.1                                  
host: %(host)s                                                       
connection: close                                                    

''' % dict(                                                          
    uri=parsed.path,                                                 
    host=parsed.netloc,                                              
))                                                                   

print socket.recv(1024)                                              
socket.close()

I've published an article with complete source code showing how to use urllib2 + SOCKS + Tor on http://blog.databigbang.com/distributed-scraping-with-multiple-tor-circuits/

Hope it solves your issues.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!