How i can get new ip from tor every requests in threads?

后端 未结 2 1916
予麋鹿
予麋鹿 2021-01-06 08:21

I try to use TOR proxy for scraping and everything works fine in one thread, but this is slow. I try to do something simple:



        
2条回答
  •  难免孤独
    2021-01-06 08:55

    You only have one proxy, which is listening on the port 9050. All 3 processes are sending requests in parallel through that proxy so they share the same IP.

    What is happening is:

    1. All 3 processes ask the proxy to get a new IP
    2. The proxy either request a new IP 3 times, receive 3 responses and apply the last one or it will recognize that it is already waiting for a new IP and disregard 2 of the requests, answering the 3 of them together. That will depend on the proxy implementation.
    3. The processes send their requests through the proxy, which results in the same IP.
    4. The processes are completed and another 3 processes are initiated. Rinse and repeat.

    That is why the IPs are the same for every block of 3 requests.
    You'll need 3 independent proxies to have 3 different IPs at the same time.


    EDIT:

    Possible solution using locks and assuming 3 proxies running on the background:

    import contextlib, threading, time
    
    _controller_ports = [
        # (Controller Lock, connection port, management port)
        (threading.Lock(), 9050, 9051),
        (threading.Lock(), 9060, 9061),
        (threading.Lock(), 9070, 9071),
    ]
    
    def get_new_ip_for(port):
        with Controller.from_port(port=port) as controller:
            controller.authenticate(password="password")
            controller.signal(Signal.NEWNYM)
            time.sleep(controller.get_newnym_wait())
    
    @contextlib.contextmanager
    def get_port_with_new_ip():
        while True:
            for lock, con_port, manage_port in _controller_ports:
                if lock.acquire(blocking=False):
                    get_new_ip_for(manage_port)
                    yield con_port
                    lock.release()
                    break
            time.sleep(1)
    
    def check_ip():
        with get_port_with_new_ip() as port:
            session = requests.session() 
            session.proxies = {'http': f'socks5h://localhost:{port}', 'https': f'socks5h://localhost:{port}'}
            r = session.get('http://httpbin.org/ip')
            print(r.text)
    
    with Pool(processes=3) as pool:
        for _ in range(9):
            pool.apply_async(check_ip)
        pool.close()
        pool.join()
    

提交回复
热议问题