I have been reading about the thread-pool pattern and I can\'t seem to find the usual solution for the following problem.
I sometimes want tasks to be executed serial
I think you're mixing concepts. Threadpool is ok when you want to distribute some work among threads but if you start mixing dependencies between threads then it isn't such a good idea.
My advice, simply don't use the threadpool for those tasks. Just create a dedicated thread and keep a simple queue of sequential items that must be processed by that thread alone. Then you can keep pushing tasks to the thread pool when you don't have a sequential requirement and use the dedicated thread when you have.
A clarification: Using common sense, a queue of serial tasks shall be executed by a single thread processing each task one after another :)
Since you have sequential jobs, you can gather up those jobs in a chain and let the jobs themselves resubmit to the thread pool once they are done. Suppose we have a list of jobs:
[Task1, ..., Task6]
like in your example. We have a sequential dependency, such that [Task3, Task4, Task6]
is a dependency chain. We now make a job (Erlang pseudo-code):
Task4Job = fun() ->
Task4(), % Exec the Task4 job
push_job(Task6Job)
end.
Task3Job = fun() ->
Task3(), % Execute the Task3 Job
push_job(Task4Job)
end.
push_job(Task3Job).
That is, we alter the Task3
job by wrapping it into a job which as a continuation pushes the next job in the queue to the thread pool. There are strong similarities to a general continuation passing style here also seen in systems like Node.js
or Pythons Twisted
framework.
Generalizing, you make a system where you can define job chains which can defer
further work and resubmit the further work.
Why do we even bother splitting up the jobs? I mean, since they are sequentially dependent, executing all of them on the same Thread won't be faster or slower than taking that chain and spreading it out over multiple threads. Assuming "enough" work load, any thread will always have work to anyway, so just bundling the jobs together is probably easiest:
Task = fun() ->
Task3(),
Task4(),
Task6() % Just build a new job, executing them in the order desired
end,
push_job(Task).
It is rather easy to do stuff like this if you have functions as first-class citizens so you can build them in your language at whim, like you can in, say, Any functional programming language, Python, Ruby-blocks - and so on.
I don't particularly like the idea of building a queue, or a continuation stack, like in "Option 1" though and I would definitely go with the second option. In Erlang, we even have a programs called jobs
written by Erlang Solutions and released as Open Source. jobs
is built to execute and load regulate job executions like these. I'd probably combine option 2 with jobs if I were to solve this problem.
The answers suggesting not use a thread-pool is like hard-coding the knowledge of task dependencies/execution order. Instead, I would create a CompositeTask
that manges the start/end dependency between two tasks. By encapsulating the dependency behind the task interface, all tasks can be treated uniformly, and added to the pool. This hides the execution details and allows the task dependencies to change without affecting whether or not you use a thread pool.
The question doesn't specify a language - I'll use Java, which I hope is readable for most.
class CompositeTask implements Task
{
Task firstTask;
Task secondTask;
public void run() {
firstTask.run();
secondTask.run();
}
}
This executes tasks sequentially and on the same thread. You can chain many CompositeTask
s together to create a sequence of as many sequential tasks as needed.
The downside here is that this ties up the thread for the duration of all tasks executing sequentially. You may have other tasks that you would prefer to execute inbetween the first and second tasks. So, rather than execute the second task directly, have the composite task schedule execution of the second task:
class CompositeTask implements Runnable
{
Task firstTask;
Task secondTask;
ExecutorService executor;
public void run() {
firstTask.run();
executor.submit(secondTask);
}
}
This ensures that the second task doesn't run until after the first task is complete and also allows the pool to execute other (possibly more urgent) tasks. Note that the first and second tasks may execute on separate threads, so although they do not execute concurrently, any shared data used by the tasks must be made visible to other threads (e.g. by making the variables volatile
.)
This is a simple, yet powerful and flexible approach, and allows the tasks themselves to define execution constraints, rather than doing it by using different thread pools.
I think thread pool can be effectively used in this situation. The idea is to use separate strand
object for each group of dependent tasks. You add tasks to your queue with or w/o strand
object. You use the same strand
object with dependent tasks. Your scheduler checks if the next task has a strand
and if this strand
is locked. If not - lock this strand
and run this task. If strand
is already locked - keep this task in queue until next scheduling event. When task is done unlock its strand
.
In result you need single queue, you don't need any additional threads, no complicated groups etc. strand
object can be very simple with two methods lock
and unlock
.
I often meet the same design problem, e.g. for an asynchronous network server that handles multiple simultaneous sessions. Sessions are independent (this maps them to your independent tasks and groups of dependent tasks) when tasks inside sessions are dependent (this maps session internal tasks to your dependent tasks inside a group). Using described approach I avoid explicit synchronization inside session completely. Every session has own strand
object.
And what is more, I use existing (great) implementation of this idea: Boost Asio library (C++). I just used their term strand
. Implementation is elegant: I wrap my async tasks into corresponding strand
object before scheduling them.
Since you only need to wait for a single task to complete before starting the dependent task, it can be easily done if you can schedule the dependent task in the first task. So in your second example: at the end of task 2, schedule task 7 and at the end of task 3, schedule task 4 and so on for 4->6 and 6->8.
In the beginning, just schedule tasks 1,2,5,9... and the rest should follow.
An even more general problem is when you have to wait for multiple tasks before a dependent task can start. Handling that efficiently is a non-trivial exercise.
There have been a lot of answers, and obviously one has been accepted. But why not use continuations?
If you have a known "serial" condition, then when you enqueue the first task with this condition, hold the Task; and for further tasks invoke Task.ContinueWith().
public class PoolsTasks
{
private readonly object syncLock = new object();
private Task serialTask = Task.CompletedTask;
private bool isSerialTask(Action task) {
// However you determine what is serial ...
return true;
}
public void RunMyTask(Action myTask) {
if (isSerialTask(myTask)) {
lock (syncLock)
serialTask = serialTask.ContinueWith(_ => myTask());
} else
Task.Run(myTask);
}
}