From java docs,
A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute subtasks created by other active tasks (eventually blocking waiting for work if none exist).
This enables efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks). When setting asyncMode to true in constructors, ForkJoinPools may also be appropriate for use with event-style tasks that are never joined.
After going through below ForkJoinPool example, Unlike ThreadPoolExecutor, I have not seen parameter to set Queue size. I did not get clue on how ForkJoinPool stealing mechanism.
//creating the ThreadPoolExecutor
ThreadPoolExecutor executorPool = new ThreadPoolExecutor(2, 10, 60, TimeUnit.SECONDS, 
new ArrayBlockingQueue<Runnable>(3000), threadFactory, rejectionHandler);
Assume that I have created ThreadPoolExecutor with 10 threads and 3000 Callable tasks have been submitted. How these threads share the load of execution of sub tasks?
And How ForkJoin pool behaves differently for same use case?
In ForkJoinPool, there are two kinds of queues — the pool one which you basically used when submitting a task, and the thread specific one (i.e. one for each thread). From a ForkJoinTask you can invoke new tasks (generally a split of your problem). 
These new tasks are not offered to the pool queue but to the thread specific one. Thus, they are taken/pulled in priority to the pool one, as if you have done all the job in the same task. Furthermore, the invoker task appears to be blocked for subtask completion.
In reality, the "blocked time" is spent to consume subtasks. It will be stupid to let other threads "to loaf around" while one of them is flooded by work. So, "work stealing" takes place.
To go beyond. To be efficient, "work stealing" takes/pulls task from the opposite bound. This greatly reduces contention over queue writing.
Always in efficiency, it's better to only split the problem in two subtasks and let the subtask split again and again. Even if you know the problem must be split directly in N parts. This is because "work stealing" requires concurrent writes to a shared resource, so limit its activation and contention!
If you have 3000 tasks in advance, and they are not going to spawn other tasks, the two will not behave substantially differently: with 10 threads, 10 tasks will be run at a time until they are all done.
ForkJoinPool is designed for the case where you have one or a few tasks to start with, but the tasks know how to split themselves up into subtasks. In this situation, ForkJoinPool is optimized to permit tasks to check on the availability of processing threads and split themselves up appropriately.
来源:https://stackoverflow.com/questions/33448465/threadpoolexecutor-vs-forkjoinpool-stealing-subtasks