How check if a task is already in python Queue?

后端未结

关注

 12  2488

青春惊慌失措 2020-12-05 15:04

I\'m writing a simple crawler in Python using the threading and Queue modules. I fetch a page, check links and put them into a queue, when a certain thread has finished proc

12条回答

执笔经年 (楼主)

2020-12-05 15:44
If you don't care about the order in which items are processed, I'd try a subclass of Queue that uses set internally:
```
class SetQueue(Queue):

    def _init(self, maxsize):
        self.maxsize = maxsize
        self.queue = set()

    def _put(self, item):
        self.queue.add(item)

    def _get(self):
        return self.queue.pop()
```
As Paul McGuire pointed out, this would allow adding a duplicate item after it's been removed from the "to-be-processed" set and not yet added to the "processed" set. To solve this, you can store both sets in the Queue instance, but since you are using the larger set for checking if the item has been processed, you can just as well go back to queue which will order requests properly.
```
class SetQueue(Queue):

    def _init(self, maxsize):
        Queue._init(self, maxsize) 
        self.all_items = set()

    def _put(self, item):
        if item not in self.all_items:
            Queue._put(self, item) 
            self.all_items.add(item)
```
The advantage of this, as opposed to using a set separately, is that the Queue's methods are thread-safe, so that you don't need additional locking for checking the other set.
0 讨论(0)

查看其它12个回答
发布评论:

提交评论
- 加载中...