Fibonacci on Java ExecutorService runs faster sequentially than in parallel

问题

I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).

Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.

To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.

Any idea why this is happening?

BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.

public class MainSimpler {
    static int N=35;
    static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
    static ExecutorService executor;

    public static void main(String[] args) {
        int nThreads=2;
        System.out.println("Number of threads = "+nThreads);
        executor = Executors.newFixedThreadPool(nThreads);
        Executable.inQueue = new AtomicInteger(nThreads);
        long before = System.currentTimeMillis();
        System.out.println("Fibonacci "+N+" is ... ");
        executor.submit(new FibSimpler(N));
        waitToFinish();
        System.out.println(result.get());
        long after = System.currentTimeMillis();        
        System.out.println("Duration: " + (after - before) + " milliseconds\n");
    }

    private static void waitToFinish() {
        while (0 < pendingTasks.get()){
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        executor.shutdown();
    }
}



class FibSimpler implements Runnable {
    int N;
    FibSimpler (int n) { N=n; }

    @Override
    public void run() {
        compute();
        MainSimpler.pendingTasks.decrementAndGet(); // see text
    }

    void compute() {
        int n = N;
        if (n <= 1) {
            MainSimpler.result.addAndGet(n); // see text
            return;
        }
        MainSimpler.executor.submit(new FibSimpler(n-1));
        MainSimpler.pendingTasks.incrementAndGet(); // see text
        N = n-2;
        compute();  // similar to the F/J counterpart
    }
}

Runtime (approximately):

1 thread : 11 seconds
2 threads: 19 seconds
4 threads: 19 seconds

Update: I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??

回答1:

It might be related to the fact that the size of each "task" submitted to the executor service is really small.

This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.

But still it must be the same number also if I set nThreads to 1.

I'm guessing you meant 'to higher than 1' here.

You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.

You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).

来源：https://stackoverflow.com/questions/13644222/fibonacci-on-java-executorservice-runs-faster-sequentially-than-in-parallel

标签

java

parallel-processing

executorservice