Slowdown of pi calculation when Timer is used

一曲冷凌霜 提交于 2019-12-07 10:17:07

问题


The following code is my code for calculating pi = 3.1415... approximately using this formula:

use Time;
var timer = new Timer();

config const n = 10**9;
var x = 0.0, s = 0.0;

// timer.start();                                     // [1]_____

for k in 0 .. n {
    s = ( if k % 2 == 0 then 1.0 else -1.0 );  // (-1)^k
    x += s / ( 2.0 * k + 1.0 );
}

// timer.stop();                                      // [2]_____
// writeln( "time = ", timer.elapsed() );             // [3]_____

   writef( "pi (approx) = %30.20dr\n", x * 4 );
// writef( "pi (exact)  = %30.20dr\n", pi );          // [4]_____

When the above code is compiled as chpl --fast test.chpl and executed as time ./a.out, then it runs with ~4 seconds as

pi (approx) =         3.14159265458805059268

real    0m4.334s
user    0m4.333s
sys     0m0.006s

On the other hand, if I uncomment Lines [1--3] ( to use Timer ), the program runs much slower with ~10 seconds as

time = 10.2284
pi (approx) =         3.14159265458805059268

real    0m10.238s
user    0m10.219s
sys     0m0.018s

The same slow-down occurs when I uncomment only Line [4] ( to print the built-in value of pi, with Lines [1-3] kept commented out ):

pi (approx) =         3.14159265458805059268
pi (exact)  =         3.14159265358979311600

real    0m10.144s
user    0m10.141s
sys     0m0.009s

So I'm wondering why this slow-down occurs...

Am I missing something in the above code (e.g., wrong usage of Timer)?

My environment is OSX10.11 + chapel-1.16 installed via homebrew. More details are below:

$ printchplenv --anonymize
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: clang
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_GMP: gmp
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none

$ clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Update

Following the suggestions, I installed Chapel from source by following this and this pages and adding CHPL_TARGET_COMPILER=gnu to ~/.chplconfig (before running make). Then, all the three cases above ran with ~4 seconds. So, the problem may be related to clang on OSX10.11. According to the comments, newer OSX (>= 10.12) does not have this problem, so it may be simply sufficient to upgrade to newer OSX/clang (>= 9.0). FYI, the updated environment info (with GNU) is as follows:

$ printchplenv --anonymize
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: gnu +
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_GMP: none
CHPL_HWLOC: hwloc
CHPL_REGEXP: none
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none

回答1:


Am I missing something in the above code (e.g., wrong usage of Timer)?

No, you're not missing anything and are using Timer (and Chapel) in a completely reasonable way. From my own experimentation (which confirms yours and is noted in the comments under your question), this looks to be a back-end compiler issue rather than a fundamental problem in Chapel or your use of it.




回答2:


[--fast] reduces run-time checks, yet not the issue may re-run here

Kindly may also note, how big are setup/operation add-on overheads,
brought in just for educational purposes
( to experiment with concurrent-processing ), that make the forall-constructor equipped with Atomics .add() method, accrue a way much higher overheads, than a concurrent-processing allow to gain, as there is so tiny computation inside the [PAR]-enabled fraction of the process ( ref. newly re-formulated Amdahl's Law on these too thin [PAR]-gains v/s indeed too high add-on overheads to the [SEQ]-costs ).

An exemplary message.

use Time;
var timer = new Timer();

config const n = 10**9;
         var s = 0.0, x = 0.0;
         var AtomiX: atomic real;                           // [AtomiX]______
             AtomiX.write( 0.0 );                           // [AtomiX]______

timer.start();                                              // [1]_____

for k in 0 .. n {
    s  = ( if k % 2 == 0 then 1.0 else -1.0 );     // (-1)^k
    x += s / ( 2.0 * k + 1.0 );
}

/* forall k in 0..n { AtomiX.add( ( if k % 2 == 0 then 1.0 else -1.0 )
                                / ( 2.0 * k + 1.0 )
                                  ); } */                   // [AtomiX]______

timer.stop();                                               // [2]_____
writeln( "time = ", timer.elapsed() );                      // [3]_____

   writef( "pi (approx) = %30.20dr\n", 4 * x );    
// writef( "pi (approx) = %30.20dr\n", 4 * AtimiX.read() ); // [AtomiX]______
// writef( "pi (exact)  = %30.20dr\n", pi );                // [4]_____

/*
--------------------------------------------------- [--fast] // AN EMPTY RUN
time = 1e-06

Real time:  9.582 s
User time:  8.479 s
Sys. time:  0.591 s
CPU share: 94.65 %
Exit code: 0
--------------------------------------------------- [--fast] // all commented

pi (approx) =         3.14159265458805059268

Real time: 15.553 s
User time: 13.484 s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~> Timer ~ +/- 1 second ( O/S noise )
Sys. time:  0.985 s
CPU share: 93.03 %
Exit code: 0
-------------------------------------------------- [--fast ] // Timer-un-commented
time = 5.30128
time = 5.3329
pi (approx) =         3.14159265458805059268

Real time: 14.356 s
User time: 13.047 s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~< Timer ~ +/- 1 second ( O/S noise )
Sys. time:  0.585 s
CPU share: 94.95 %
Exit code: 0

Real time: 16.804 s
User time: 14.853 s
Sys. time:  0.925 s
CPU share: 93.89 %
Exit code: 0

-------------------------------------------------- [--fast] // Timer-un-commented + forall + Atomics

time = 14.7406
pi (approx) =         3.14159265458805680993

Real time: 28.099 s
User time: 26.246 s
Sys. time: 0.914 s
CPU share: 96.65 %
Exit code: 0
*/


来源:https://stackoverflow.com/questions/47561759/slowdown-of-pi-calculation-when-timer-is-used

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!