问题
Im trying get the elapsed time of my program. Actually i thought I should use yclock()
from time.h
. But it stays zero in all phases of the program although I'm adding 10^5 numbers(there must be some CPU time consumed). I already searched this problem and it seems like, people running Linux are having this issue only. I'm running Ubuntu 12.04LTS.
I'm going to compare AVX and SSE instructions, so using time_t
is not really an option. Any hints?
Here is the code:
//Dimension of Arrays
unsigned int N = 100000;
//Fill two arrays with random numbers
unsigned int a[N];
clock_t start_of_programm = clock();
for(int i=0;i<N;i++){
a[i] = i;
}
clock_t after_init_of_a = clock();
unsigned int b[N];
for(int i=0;i<N;i++){
b[i] = i;
}
clock_t after_init_of_b = clock();
//Add the two arrays with Standard
unsigned int out[N];
for(int i = 0; i < N; ++i)
out[i] = a[i] + b[i];
clock_t after_add = clock();
cout << "start_of_programm " << start_of_programm << endl; // prints
cout << "after_init_of_a " << after_init_of_a << endl; // prints
cout << "after_init_of_b " << after_init_of_b << endl; // prints
cout << "after_add " << after_add << endl; // prints
cout << endl << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << endl;
And the output of the console. I also used printf()
with %d
, with no difference.
start_of_programm 0
after_init_of_a 0
after_init_of_b 0
after_add 0
CLOCKS_PER_SEC 1000000
回答1:
The simplest way to get the time is to just use a stub function from OpenMP. This will work on MSVC, GCC, and ICC. With MSVC you don't even need to enable OpenMP. With ICC you can link just the stubs if you like -openmp-stubs
. With GCC you have to use -fopenmp
.
#include <omp.h>
double dtime;
dtime = omp_get_wtime();
foo();
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);
回答2:
clock
does indeed return the CPU time used, but the granularity is in the order of 10Hz. So if your code doesn't take more than 100ms, you will get zero. And unless it's significantly longer than 100ms, you won't get a very accurate value, because it your error margin will be around 100ms.
So, increasing N or using a different method to measure time would be your choices. std::chrono
will most likely produce a more accurate timing (but it will measure "wall-time", not CPU-time).
timespec t1, t2;
clock_gettime(CLOCK_REALTIME, &t1);
... do stuff ...
clock_gettime(CLOCK_REALTIME, &t2);
double t = timespec_diff(t2, t1);
double timespec_diff(timespec t2, timespec t1)
{
double d1 = t1.tv_sec + t1.tv_nsec / 1000000000.0;
double d2 = t2.tv_sec + t2.tv_nsec / 1000000000.0;
return d2 - d1;
}
回答3:
First, compiler is very likely to optimize your code. Check your compiler's optimization option.
Since array including out[], a[], b[]
are not used by the successive code, and no value from out[], a[], b[]
would be output, the compiler is to optimize code block as follows like never execute at all:
for(int i=0;i<=N;i++){
a[i] = i;
}
for(int i=0;i<=N;i++){
b[i] = i;
}
for(int i = 0; i < N; ++i)
out[i] = a[i] + b[i];
Since clock()
function returns CPU time, the above code consume almost no time after optimization.
And one more thing, set N a bigger value. 100000 is too small for a performance test, nowadays computer runs very fast with o(n) code at 100000 scale.
unsigned int N = 10000000;
回答4:
Add this to the end of the code
int sum = 0;
for(int i = 0; i<N; i++)
sum += out[i];
cout << sum;
Then you will see the times.
Since you dont use a[], b[], out[]
it ignores corresponding for loops. This is because of optimization of the compiler.
Also, to see the exact time it takes use debug mode
instead of release
, then you will be able to see the time it takes.
来源:https://stackoverflow.com/questions/18696505/c-clock-stays-zero