Java BlockingQueue with batching?

♀尐吖头ヾ 提交于 2020-06-22 09:36:28

问题


I am interested in a data structure identical to the Java BlockingQueue, with the exception that it must be able to batch objects in the queue. In other words, I would like the producer to be able to put objects into the queue, but have the consumer block on take() untill the queue reaches a certain size (the batch size).

Then, once the queue has reached the batch size, the producer must block on put() untill the consumer has consumed all of the elements in the queue (in which case the producer will start producing again and the consumer block untill the batch is reached again).

Does a similar data structure exist? Or should I write it (which I don't mind), I just don't want to waste my time if there is something out there.


UPDATE

Maybe to clarify things a bit:

The situation will always be as follows. There can be multiple producers adding items to the queue, but there will never be more than one consumer taking items from the queue.

Now, the problem is that there are multiple of these setups in parallel and serial. In other words, producers produce items for multiple queues, while consumers in their own right can also be producers. This can be more easily thought of as a directed graph of producers, consumer-producers, and finally consumers.

The reason that producers should block until the queues are empty (@Peter Lawrey) is because each of these will be running in a thread. If you leave them to simply produce as space becomes available, you will end up with a situation where you have too many threads trying to process too many things at once.

Maybe coupling this with an execution service could solve the problem?


回答1:


I would suggest you use BlockingQueue.drainTo(Collection, int). You can use it with take() to ensure you get a minimum number of elements.

The advantage of using this approach is that your batch size grows dynamically with the workload and the producer doesn't have to block when the consumer is busy. i.e. it self optimises for latency and throughput.


To implement exactly as asked (which I think is a bad idea) you can use a SynchronousQueue with a busy consuming thread.

i.e. the consuming thread does a

 list.clear();
 while(list.size() < required) list.add(queue.take());
 // process list.

The producer will block when ever the consumer is busy.




回答2:


Here is a quick ( = simple but not fully tested) implementation that i think may be suitable for your requests - you should be able to extend it to support the full queue interface if you need to.

to increase performance you can switch to ReentrantLock instead of using "synchronized" keyword..

public class BatchBlockingQueue<T> {

    private ArrayList<T> queue;
    private Semaphore readerLock;
    private Semaphore writerLock;
    private int batchSize;

    public BatchBlockingQueue(int batchSize) {
        this.queue = new ArrayList<>(batchSize);
        this.readerLock = new Semaphore(0);
        this.writerLock = new Semaphore(batchSize);
        this.batchSize = batchSize;
    }

    public synchronized void put(T e) throws InterruptedException {
        writerLock.acquire();
        queue.add(e);
        if (queue.size() == batchSize) {
            readerLock.release(batchSize);
        }
    }

    public synchronized T poll() throws InterruptedException {
        readerLock.acquire();
        T ret = queue.remove(0);
        if (queue.isEmpty()) {
            writerLock.release(batchSize);
        }
        return ret;
    }

}

Hope you find it useful.




回答3:


Not that I am aware of. If I understand correctly you want either the producer to work (while the consumer is blocked) until it fills the queue or the consumer to work (while the producer blocks) until it clears up the queue. If that's the case may I suggest that you don't need a data structure but a mechanism to block the one party while the other one is working in a mutex fasion. You can lock on an object for that and internally have the logic of whether full or empty to release the lock and pass it to the other party. So in short, you should write it yourself :)




回答4:


This sounds like how the RingBuffer works in the LMAX Disruptor pattern. See http://code.google.com/p/disruptor/ for more.

A very rough explanation is your main data structure is the RingBuffer. Producers put data in to the ring buffer in sequence and consumers can pull off as much data as the producer has put in to the buffer (so essentially batching). If the buffer is full, the producer blocks until the consumer has finished and freed up slots in the buffer.




回答5:


I recently developed this utility that batch BlockingQueue elements using a flushing timeout if queue elements doesn't reach the batch size. It also supports fanOut pattern using multiple instances to elaborate the same set of data:

// Instantiate the registry
FQueueRegistry registry = new FQueueRegistry();

// Build FQueue consumer
registry.buildFQueue(String.class)
                .batch()
                .withChunkSize(5)
                .withFlushTimeout(1)
                .withFlushTimeUnit(TimeUnit.SECONDS)
                .done()
                .consume(() -> (broadcaster, elms) -> System.out.println("elms batched are: "+elms.size()));

// Push data into queue
for(int i = 0; i < 10; i++){
        registry.sendBroadcast("Sample"+i);
}

More info here!

https://github.com/fulmicotone/io.fulmicotone.fqueue



来源:https://stackoverflow.com/questions/9466595/java-blockingqueue-with-batching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!