Right now, I have a queue, with multiple producers and single consumer.
Consumer thread operation is slow. Also, consumer takes element from queue through a peek ope
I'd agree with Jon Skeet (+1) in that you need two stores for recording waiting and in-progress items. I would use a LinkedBlockingQueue and have each of your consumers call take() on it. When an element arrives on the queue it will be taken by one of the consumers.
Recording what is in progress and what is completed would be separate operation. I would maintain a HashSet of all the items that have not yet completed and my producer would first (atomically) add the item to the HashSet of non-completed items and then pop the item on the queue. Once a consumer have finished it's work, it removes the item from the HashSet.
Your producer can scan the HashSet to determine what is outstanding.