multithreaded crawler while using tor proxy

后端 未结 1 902
[愿得一人]
[愿得一人] 2020-12-21 12:17

I am trying to build multi threaded crawler that uses tor proxies: I am using following to establish tor connection:

from stem import Signal
from stem.contro         


        
相关标签:
1条回答
  • 2020-12-21 12:57

    This is a perfect example of why monkey patching socket.socket is bad.

    This replaces the socket used by all socket connections (which is most everything) with the SOCKS socket.

    When you go to connect to the controller later, it attempts to use the SOCKS protocol to communicate instead of establishing a direct connection.

    Since you're already using requests, I'd suggest getting rid of SocksiPy and the socks.socket = socks.socksocket code and using the SOCKS proxy functionality built into requests:

    proxies = {
        'http': 'socks5h://127.0.0.1:9050',
        'https': 'socks5h://127.0.0.1:9050'
    }
    
    response = r.get(url, headers=request_headers, proxies=proxies)
    
    0 讨论(0)
提交回复
热议问题