Processing tasks in parallel and sequentially Java

↘锁芯ラ 提交于 2021-02-10 05:35:21

问题


In my program, the user can trigger different tasks via an interface, which take some time to process. Therefore they are executed by threads. So far I have implemented it so that I have an executer with one thread that executes all tasks one after the other. But now I would like to parallelize everything a little bit.

i.e. I would like to run tasks in parallel, except if they have the same path, then I want to run them sequentially. For example, I have 10 threads in my pool and when a task comes in, the task should be assigned to the worker which is currently processing a task with the same path. If no task with the same path is currently being processed by a worker, then the task should be processed by a currently free worker.

Additional info: A task is any type of task that is executed on a file in the local file system. For example, renaming a file. Therefore, the task have the attribute path. And I don't want to execute two tasks on the same file at the same time, so such tasks with the same paths should be performed sequentially.

Here is my sample code but there is work to do:

One of my problems is, I need a safe way to check if a worker is currently running and get the path of the currently running worker. By safe I mean, that no problems of simultaneous access or other thread problems occur.

    public class TasksOrderingExecutor {
    
        public interface Task extends Runnable {
            //Task code here
            String getPath();
        }
    
        private static class Worker implements Runnable {
    
            private final LinkedBlockingQueue<Task> tasks = new LinkedBlockingQueue<>();

            //some variable or mechanic to give the actual path of the running tasks??
    
            private volatile boolean stopped;
    
            void schedule(Task task) {
                tasks.add(task);
            }
    
            void stop() {
                stopped = true;
            }
    
            @Override
            public void run() {
                while (!stopped) {
                    try {
                        Task task = tasks.take();
                        task.run();
                    } catch (InterruptedException ie) {
                        // perhaps, handle somehow
                    }
                }
            }
        }
    
        private final Worker[] workers;
        private final ExecutorService executorService;
    
        /**
         * @param queuesNr nr of concurrent task queues
         */
        public TasksOrderingExecutor(int queuesNr) {
            Preconditions.checkArgument(queuesNr >= 1, "queuesNr >= 1");
            executorService = new ThreadPoolExecutor(queuesNr, queuesNr, 0, TimeUnit.SECONDS, new SynchronousQueue<>());
            workers = new Worker[queuesNr];
            for (int i = 0; i < queuesNr; i++) {
                Worker worker = new Worker();
                executorService.submit(worker);
                workers[i] = worker;
            }
        }
    
        public void submit(Task task) {
            Worker worker = getWorker(task);
            worker.schedule(task);
        }
    
        public void stop() {
            for (Worker w : workers) w.stop();
            executorService.shutdown();
        }
    
        private Worker getWorker(Task task) {
            //check here if a running worker with a specific path exists? If yes return it, else return a free worker. How do I check if a worker is currently running?
            return workers[task.getPath() //HERE I NEED HELP//];
        }
    }

回答1:


Seems like you have a pair of problems:

  • You want to check the status of tasks submitted to an executor service
  • You want to run tasks in parallel, and possibly prioritize them

Future

For the first problem, capture the Future object returned when you submit a task to an executor service. You can check the Future object for its completion status.

Future< Task > future = myExecutorService.submit( someTask ) ;
…
boolean isCancelled = future.isCancelled() ;  // Returns true if this task was cancelled before it completed normally.
boolean isDone = future.isDone();  // Returns true if this task completed.

The Future is of a type, and that type can be your Task class itself. Calling Future::get yields the Task object. You can then interrogate that Task object for its contained file path.

Task task = future.get() ;
String path = task.getPath() ;  // Access field via getter from your `Task` object.

Executors

Rather than instantiating new ThreadPoolExecutor, use the Executors utility class to instantiate an executor service on your behalf. Instantiating ThreadPoolExecutor directly is not needed for most common scenarios, as mentioned in the first line of its Javadoc.

ExecutorService es = Executors.newFixedThreadPool​( 3 ) ;  // Instantiate an executor service backed by a pool of three threads.

For the second problem, use an executor service backed by a thread pool rather than a single thread. The executor service automatically assigns the submitted task to an available thread.

As for grouping or prioritizing, use multiple executor services. You can instantiate more than one. You can have as many executor services as you want, provided you do not overload the demand on your deployment machine for CPU cores and memory (think about your maximum simultaneous usage).

ExecutorService esSingleThread = Executors.newSingleThreadExecutor() ;
ExecutorService esMultiThread = Executors.newCachedThreadPool() ;

One executor service might be backed by a single thread to limit the demands on the deployment computer, while others might be backed by a thread pool to get more work done. You can use these multiple executor services as your multiple queues. No need for you to be managing queues and workers as seen in the code of your Question. Executors were invented to further simplify working with multiple threads.

Concurrency

You said:

And I don't want to execute two tasks on the same file at the same time, so such tasks with the same paths should be performed sequentially.

You should have a better way to handle the concurrency conflict that just scheduling tasks on threads.

Java has ways to manage concurrent access to files. Search to learn more, as this has been covered on Stack Overflow already.


Perhaps I have not understood fully your needs, so do comment if I am off-base.




回答2:


It seems that you need some sort of "Task Dispatcher" that executes or holds some tasks depending on some identifier (here the Path of the file the task is applied to).

You could use something like this :

public class Dispatcher<I> implements Runnable {

/**
 * The executor used to execute the submitted task
 */
private final Executor executor;

/**
 * Map of the pending tasks
 */
private final Map<I, Deque<Runnable>> pendingTasksById = new HashMap<>();

/**
 * set containing the id that are currently executed
 */
private final Set<I> runningIds = new HashSet<>();

/**
 * Action to be executed by the dispatcher
 */
private final BlockingDeque<Runnable> actionQueue = new LinkedBlockingDeque<>();

public Dispatcher(Executor executor) {
    this.executor = executor;
}

/**
 * Task in the same group will be executed sequentially (but not necessarily in the same thread)
 * @param id the id of the group the task belong
 * @param task the task to execute
 */
public void submitTask(I id, Runnable task) {
    actionQueue.addLast(() -> {
        if (canBeLaunchedDirectly(id)) {
            executeTask(id, task);
        } else {
            addTaskToPendingTasks(id, task);
            ifPossibleLaunchPendingTaskForId(id);
        }
    });
}


@Override
public void run() {
    while (!Thread.currentThread().isInterrupted()) {
        try {
            actionQueue.takeFirst().run();
        } catch (InterruptedException e) {
            Thread.currentThread().isInterrupted();
            break;
        }
    }
}


private void addTaskToPendingTasks(I id, Runnable task) {
    this.pendingTasksById.computeIfAbsent(id, i -> new LinkedList<>()).add(task);
}


/**
 * @param id an id of a group
 * @return true if a task of the group with the provided id is currently executed
 */
private boolean isRunning(I id) {
    return runningIds.contains(id);
}

/**
 * @param id an id of a group
 * @return an optional containing the first pending task of the group,
 * an empty optional if no such task is available
 */
private Optional<Runnable> getFirstPendingTask(I id) {
    final Deque<Runnable> pendingTasks = pendingTasksById.get(id);
    if (pendingTasks == null) {
        return Optional.empty();
    }
    assert !pendingTasks.isEmpty();
    final Runnable result = pendingTasks.removeFirst();
    if (pendingTasks.isEmpty()) {
        pendingTasksById.remove(id);
    }
    return Optional.of(result);
}

private boolean canBeLaunchedDirectly(I id) {
    return !isRunning(id) && pendingTasksById.get(id) == null;
}

private void executeTask(I id, Runnable task) {
    this.runningIds.add(id);
    executor.execute(() -> {
        try {
            task.run();
        } finally {
            actionQueue.addLast(() -> {
                runningIds.remove(id);
                ifPossibleLaunchPendingTaskForId(id);
            });
        }
    });
}

private void ifPossibleLaunchPendingTaskForId(I id) {
    if (isRunning(id)) {
        return;
    }
    getFirstPendingTask(id).ifPresent(r -> executeTask(id, r));
}

}

To use it, you need to launch it in a separated thread (or you can adapt it for a cleaner solution) like this :

    final Dispatcher<Path> dispatcher = new Dispatcher<>(Executors.newCachedThreadPool());
    new Thread(dispatcher).start();
    dispatcher.submitTask(path, task1);
    dispatcher.submitTask(path, task2);

This is basic example, you might need to keep the thread and even better wrap all of that in a class.




回答3:


all you need is a hash map of actors, with file path as a key. Different actors would run in parallel, and concrete actor would handle tasks sequentially. Your solution is wrong because Worker class uses blocking operation take but is executed in a limited thread pool, which may lead to a thread starvation (a kind of deadlock). Actors do not block when waiting for next message.

import org.df4j.core.dataflow.ClassicActor;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.*;

public class TasksOrderingExecutor {

public static class Task implements Runnable {
    private final String path;
    private final String task;

    public Task(String path, String task) {
        this.path = path;
        this.task = task;
    }

    //Task code here
    String getPath() {
        return path;
    }

    @Override
    public void run() {
        System.out.println(path+"/"+task+" started");
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
        }
        System.out.println(path+"/"+task+" stopped");
    }
}

static class Worker extends ClassicActor<Task> {

    @Override
    protected void runAction(Task task) throws Throwable {
        task.run();
    }
}

private final ExecutorService executorService;

private final Map<String,Worker> workers = new HashMap<String,Worker>(){
    @Override
    public Worker get(Object key) {
        return super.computeIfAbsent((String) key, (k) -> {
            Worker res = new Worker();
            res.setExecutor(executorService);
            res.start();
            return res;
        });
    }
};

/**
 * @param queuesNr nr of concurrent task queues
 */
public TasksOrderingExecutor(int queuesNr) {
    executorService = ForkJoinPool.commonPool();
}

public void submit(Task task) {
    Worker worker = getWorker(task);
    worker.onNext(task);
}

public void stop() throws InterruptedException {
    for (Worker w : workers.values()) {
        w.onComplete();
    }
    executorService.shutdown();
    executorService.awaitTermination(10, TimeUnit.SECONDS);
}

private Worker getWorker(Task task) {
    //check here if a runnig worker with a specific path exists? If yes return it, else return a free worker. How do I check if a worker is currently running?
    return workers.get(task.getPath());
}

public static void main(String[] args) throws InterruptedException {
    TasksOrderingExecutor orderingExecutor = new TasksOrderingExecutor(20);
    orderingExecutor.submit(new Task("path1", "task1"));
    orderingExecutor.submit(new Task("path1", "task2"));
    orderingExecutor.submit(new Task("path2", "task1"));
    orderingExecutor.submit(new Task("path3", "task1"));
    orderingExecutor.submit(new Task("path2", "task2"));
    orderingExecutor.stop();
}
}

The protocol of execution shows that tasks with te same key are executed sequentially and tasks with different keys are executed in parallel:

path3/task1 started
path2/task1 started
path1/task1 started
path3/task1 stopped
path2/task1 stopped
path1/task1 stopped
path2/task2 started
path1/task2 started
path2/task2 stopped
path1/task2 stopped

I used my own actor library DF4J, but any other actor library can be used.



来源:https://stackoverflow.com/questions/61945308/processing-tasks-in-parallel-and-sequentially-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!