问题
The following code is my code for calculating pi = 3.1415... approximately using this formula:
use Time;
var timer = new Timer();
config const n = 10**9;
var x = 0.0, s = 0.0;
// timer.start(); // [1]_____
for k in 0 .. n {
s = ( if k % 2 == 0 then 1.0 else -1.0 ); // (-1)^k
x += s / ( 2.0 * k + 1.0 );
}
// timer.stop(); // [2]_____
// writeln( "time = ", timer.elapsed() ); // [3]_____
writef( "pi (approx) = %30.20dr\n", x * 4 );
// writef( "pi (exact) = %30.20dr\n", pi ); // [4]_____
When the above code is compiled as chpl --fast test.chpl
and executed as time ./a.out
, then it runs with ~4 seconds as
pi (approx) = 3.14159265458805059268
real 0m4.334s
user 0m4.333s
sys 0m0.006s
On the other hand, if I uncomment Lines [1--3]
( to use Timer
), the program runs much slower with ~10 seconds as
time = 10.2284
pi (approx) = 3.14159265458805059268
real 0m10.238s
user 0m10.219s
sys 0m0.018s
The same slow-down occurs when I uncomment only Line [4]
( to print the built-in value of pi, with Lines [1-3]
kept commented out ):
pi (approx) = 3.14159265458805059268
pi (exact) = 3.14159265358979311600
real 0m10.144s
user 0m10.141s
sys 0m0.009s
So I'm wondering why this slow-down occurs...
Am I missing something in the above code (e.g., wrong usage of Timer
)?
My environment is OSX10.11 + chapel-1.16 installed via homebrew. More details are below:
$ printchplenv --anonymize
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_GMP: gmp
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
$ clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Update
Following the suggestions, I installed Chapel from source by following this and this pages and adding CHPL_TARGET_COMPILER=gnu
to ~/.chplconfig
(before running make
). Then, all the three cases above ran with ~4 seconds. So, the problem may be related to clang on OSX10.11. According to the comments, newer OSX (>= 10.12) does not have this problem, so it may be simply sufficient to upgrade to newer OSX/clang (>= 9.0). FYI, the updated environment info (with GNU) is as follows:
$ printchplenv --anonymize
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: gnu +
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_GMP: none
CHPL_HWLOC: hwloc
CHPL_REGEXP: none
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
回答1:
Am I missing something in the above code (e.g., wrong usage of Timer)?
No, you're not missing anything and are using Timer
(and Chapel) in a completely reasonable way. From my own experimentation (which confirms yours and is noted in the comments under your question), this looks to be a back-end compiler issue rather than a fundamental problem in Chapel or your use of it.
回答2:
[--fast]
reduces run-time checks, yet not the issue may re-run here
Kindly may also note, how big are setup/operation add-on overheads,
brought in just for educational purposes
( to experiment with concurrent-processing ), that make the forall
-constructor equipped with Atomics .add()
method, accrue a way much higher overheads, than a concurrent-processing allow to gain, as there is so tiny computation inside the [PAR]
-enabled fraction of the process ( ref. newly re-formulated Amdahl's Law on these too thin [PAR]
-gains v/s indeed too high add-on overheads to the [SEQ]
-costs ).
An exemplary message.
use Time;
var timer = new Timer();
config const n = 10**9;
var s = 0.0, x = 0.0;
var AtomiX: atomic real; // [AtomiX]______
AtomiX.write( 0.0 ); // [AtomiX]______
timer.start(); // [1]_____
for k in 0 .. n {
s = ( if k % 2 == 0 then 1.0 else -1.0 ); // (-1)^k
x += s / ( 2.0 * k + 1.0 );
}
/* forall k in 0..n { AtomiX.add( ( if k % 2 == 0 then 1.0 else -1.0 )
/ ( 2.0 * k + 1.0 )
); } */ // [AtomiX]______
timer.stop(); // [2]_____
writeln( "time = ", timer.elapsed() ); // [3]_____
writef( "pi (approx) = %30.20dr\n", 4 * x );
// writef( "pi (approx) = %30.20dr\n", 4 * AtimiX.read() ); // [AtomiX]______
// writef( "pi (exact) = %30.20dr\n", pi ); // [4]_____
/*
--------------------------------------------------- [--fast] // AN EMPTY RUN
time = 1e-06
Real time: 9.582 s
User time: 8.479 s
Sys. time: 0.591 s
CPU share: 94.65 %
Exit code: 0
--------------------------------------------------- [--fast] // all commented
pi (approx) = 3.14159265458805059268
Real time: 15.553 s
User time: 13.484 s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~> Timer ~ +/- 1 second ( O/S noise )
Sys. time: 0.985 s
CPU share: 93.03 %
Exit code: 0
-------------------------------------------------- [--fast ] // Timer-un-commented
time = 5.30128
time = 5.3329
pi (approx) = 3.14159265458805059268
Real time: 14.356 s
User time: 13.047 s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~< Timer ~ +/- 1 second ( O/S noise )
Sys. time: 0.585 s
CPU share: 94.95 %
Exit code: 0
Real time: 16.804 s
User time: 14.853 s
Sys. time: 0.925 s
CPU share: 93.89 %
Exit code: 0
-------------------------------------------------- [--fast] // Timer-un-commented + forall + Atomics
time = 14.7406
pi (approx) = 3.14159265458805680993
Real time: 28.099 s
User time: 26.246 s
Sys. time: 0.914 s
CPU share: 96.65 %
Exit code: 0
*/
来源:https://stackoverflow.com/questions/47561759/slowdown-of-pi-calculation-when-timer-is-used