I\'m writing a simple crawler in Python using the threading and Queue modules. I fetch a page, check links and put them into a queue, when a certain thread has finished proc
Sadly, I have no enouch rating for comment the best Lukáš Lalinský’s answer.
To add support for SetQueue.task_done() and SetQueue.join() for second variant of Lukáš Lalinský’s SetQueue add else brahch to the if:
def _put(self, item):
if item not in self.all_items:
Queue._put(self, item);
self.all_items.add(item);
else:
self.unfinished_tasks -= 1;
Tested and works with Python 3.4.