I\'m writing a simple crawler in Python using the threading and Queue modules. I fetch a page, check links and put them into a queue, when a certain thread has finished proc
The put method also needs to be overwritten, if not a join call will block forever https://github.com/python/cpython/blob/master/Lib/queue.py#L147
class UniqueQueue(Queue):
def put(self, item, block=True, timeout=None):
if item not in self.queue: # fix join bug
Queue.put(self, item, block, timeout)
def _init(self, maxsize):
self.queue = set()
def _put(self, item):
self.queue.add(item)
def _get(self):
return self.queue.pop()