Synchronise muti-threads in Python

血红的双手。 提交于 2019-12-11 20:49:40

问题


The class BrokenLinkTest in the code below does the following.

  1. takes a web page url
  2. finds all the links in the web page
  3. get the headers of the links concurrently (this is done to check if the link is broken or not)
  4. print 'completed' when all the headers are received.

from bs4 import BeautifulSoup
import requests

class BrokenLinkTest(object):

    def __init__(self, url):
        self.url = url
        self.thread_count = 0
        self.lock = threading.Lock()

    def execute(self):
        soup = BeautifulSoup(requests.get(self.url).text)
        self.lock.acquire()
        for link in soup.find_all('a'):
            url = link.get('href')
            threading.Thread(target=self._check_url(url))
        self.lock.acquire()

    def _on_complete(self):
        self.thread_count -= 1
        if self.thread_count == 0: #check if all the threads are completed
            self.lock.release()
            print "completed"

    def _check_url(self, url):
        self.thread_count += 1
        print url
        result = requests.head(url)
        print result
        self._on_complete()


BrokenLinkTest("http://www.example.com").execute()

Can the concurrency/synchronization part be done in a better way. I did it using threading.Lock. This is my first experiment with python threading.


回答1:


def execute(self):
    soup = BeautifulSoup(requests.get(self.url).text)
    threads = []
    for link in soup.find_all('a'):
        url = link.get('href')
        t = threading.Thread(target=self._check_url, args=(url,))
        t.start()
        threads.append(t)
    for thread in threads:
        thread.join()

You could use the join method to wait for all the threads to finish.

Note I also added a start call, and passed the bound method object to the target param. In your original example you were calling _check_url in the main thread and passing the return value to the target param.




回答2:


All threads in Python run on the same core, so you won't be gaining any performance by doing it this way. Also - it's very unclear what is actually happening?

  1. You are never actually starting a threads, you are just initializing it
  2. The threads themselves do absolutely nothing other than decrementing the thread count

You may only gain performance in a thread-based scenario if your program is delivering work to the IO (sending requests, writing to file and so on), where other threads can work in the meanwhile.



来源:https://stackoverflow.com/questions/26234301/synchronise-muti-threads-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!