This question is about the same program I previously asked about. To recap, I have a program with a loop structure like this:
for (int i1 = 0; i1 < N; i1+
David,
Are you sure you run a kernel that supports multiple processors? If only one processor is utilized in your system, spawning additional CPU-intensive threads will slow down your program.
And, are you sure support for threads in your system actually utilizes multiple processors? Does top, for example, show that both cores in your processor utilized when you run your program?