Ensuring task execution order in threadpool

后端未结

关注

 17  964

I have been reading about the thread-pool pattern and I can\'t seem to find the usual solution for the following problem.

I sometimes want tasks to be executed serial

相关标签:

17条回答

小蘑菇

2020-12-12 12:42

I think you're mixing concepts. Threadpool is ok when you want to distribute some work among threads but if you start mixing dependencies between threads then it isn't such a good idea.

My advice, simply don't use the threadpool for those tasks. Just create a dedicated thread and keep a simple queue of sequential items that must be processed by that thread alone. Then you can keep pushing tasks to the thread pool when you don't have a sequential requirement and use the dedicated thread when you have.

A clarification: Using common sense, a queue of serial tasks shall be executed by a single thread processing each task one after another :)

0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2020-12-12 12:43
Option 1 - The complex one

Since you have sequential jobs, you can gather up those jobs in a chain and let the jobs themselves resubmit to the thread pool once they are done. Suppose we have a list of jobs:
```
 [Task1, ..., Task6]
```
like in your example. We have a sequential dependency, such that [Task3, Task4, Task6] is a dependency chain. We now make a job (Erlang pseudo-code):
```
 Task4Job = fun() ->
               Task4(), % Exec the Task4 job
               push_job(Task6Job)
            end.
 Task3Job = fun() ->
               Task3(), % Execute the Task3 Job
               push_job(Task4Job)
            end.
 push_job(Task3Job).
```
That is, we alter the Task3 job by wrapping it into a job which as a continuation pushes the next job in the queue to the thread pool. There are strong similarities to a general continuation passing style here also seen in systems like Node.js or Pythons Twisted framework.

Generalizing, you make a system where you can define job chains which can defer further work and resubmit the further work.

Option 2 - The simple one

Why do we even bother splitting up the jobs? I mean, since they are sequentially dependent, executing all of them on the same Thread won't be faster or slower than taking that chain and spreading it out over multiple threads. Assuming "enough" work load, any thread will always have work to anyway, so just bundling the jobs together is probably easiest:
```
  Task = fun() ->
            Task3(),
            Task4(), 
            Task6()  % Just build a new job, executing them in the order desired
         end,
  push_job(Task).
```
It is rather easy to do stuff like this if you have functions as first-class citizens so you can build them in your language at whim, like you can in, say, Any functional programming language, Python, Ruby-blocks - and so on.

I don't particularly like the idea of building a queue, or a continuation stack, like in "Option 1" though and I would definitely go with the second option. In Erlang, we even have a programs called jobs written by Erlang Solutions and released as Open Source. jobs is built to execute and load regulate job executions like these. I'd probably combine option 2 with jobs if I were to solve this problem.
0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2020-12-12 12:45
The answers suggesting not use a thread-pool is like hard-coding the knowledge of task dependencies/execution order. Instead, I would create a CompositeTask that manges the start/end dependency between two tasks. By encapsulating the dependency behind the task interface, all tasks can be treated uniformly, and added to the pool. This hides the execution details and allows the task dependencies to change without affecting whether or not you use a thread pool.

The question doesn't specify a language - I'll use Java, which I hope is readable for most.
```
class CompositeTask implements Task
{
    Task firstTask;
    Task secondTask;

    public void run() {
         firstTask.run();
         secondTask.run();
    }
}
```
This executes tasks sequentially and on the same thread. You can chain many CompositeTasks together to create a sequence of as many sequential tasks as needed.

The downside here is that this ties up the thread for the duration of all tasks executing sequentially. You may have other tasks that you would prefer to execute inbetween the first and second tasks. So, rather than execute the second task directly, have the composite task schedule execution of the second task:
```
class CompositeTask implements Runnable
{
    Task firstTask;
    Task secondTask;
    ExecutorService executor;

    public void run() {
         firstTask.run();
         executor.submit(secondTask);
    }
}
```
This ensures that the second task doesn't run until after the first task is complete and also allows the pool to execute other (possibly more urgent) tasks. Note that the first and second tasks may execute on separate threads, so although they do not execute concurrently, any shared data used by the tasks must be made visible to other threads (e.g. by making the variables volatile.)

This is a simple, yet powerful and flexible approach, and allows the tasks themselves to define execution constraints, rather than doing it by using different thread pools.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-12 12:46

I think thread pool can be effectively used in this situation. The idea is to use separate strand object for each group of dependent tasks. You add tasks to your queue with or w/o strand object. You use the same strand object with dependent tasks. Your scheduler checks if the next task has a strand and if this strand is locked. If not - lock this strand and run this task. If strand is already locked - keep this task in queue until next scheduling event. When task is done unlock its strand.

In result you need single queue, you don't need any additional threads, no complicated groups etc. strand object can be very simple with two methods lock and unlock.

I often meet the same design problem, e.g. for an asynchronous network server that handles multiple simultaneous sessions. Sessions are independent (this maps them to your independent tasks and groups of dependent tasks) when tasks inside sessions are dependent (this maps session internal tasks to your dependent tasks inside a group). Using described approach I avoid explicit synchronization inside session completely. Every session has own strand object.

And what is more, I use existing (great) implementation of this idea: Boost Asio library (C++). I just used their term strand. Implementation is elegant: I wrap my async tasks into corresponding strand object before scheduling them.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-12 12:47

Since you only need to wait for a single task to complete before starting the dependent task, it can be easily done if you can schedule the dependent task in the first task. So in your second example: at the end of task 2, schedule task 7 and at the end of task 3, schedule task 4 and so on for 4->6 and 6->8.

In the beginning, just schedule tasks 1,2,5,9... and the rest should follow.

An even more general problem is when you have to wait for multiple tasks before a dependent task can start. Handling that efficiently is a non-trivial exercise.

0 讨论(0)
发布评论:

提交评论
- 加载中...

忘了有多久

2020-12-12 12:47

There have been a lot of answers, and obviously one has been accepted. But why not use continuations?

If you have a known "serial" condition, then when you enqueue the first task with this condition, hold the Task; and for further tasks invoke Task.ContinueWith().

public class PoolsTasks
{
    private readonly object syncLock = new object();
    private Task serialTask = Task.CompletedTask;


    private bool isSerialTask(Action task) {
        // However you determine what is serial ...
        return true;
    }

    public void RunMyTask(Action myTask) {
        if (isSerialTask(myTask)) {
            lock (syncLock)
                serialTask = serialTask.ContinueWith(_ => myTask());
        } else
            Task.Run(myTask);
    }
}

0 讨论(0)

Ensuring task execution order in threadpool

Option 1 - The complex one

Option 2 - The simple one