I try to profile a simple c prog using valgrind:
[zsun@nel6005001 ~]$ valgrind --tool=memcheck ./fl.out
==2238== Memcheck, a memory error dete
By the way, this isn't computing factorial.
If you're really trying to find out where the time goes, you could try stackshots. I put an infinite loop around your code and took 10 of them. Here's the code:
6: void forloop(void){
7: int fac=1;
8: int count=5;
9: int i,k;
10:
11: for (i = 1; i <= count; i++){
12: for(k=1;k<=count;k++){
13: fac = fac * i;
14: }
15: }
16: }
17:
18: int main(int argc, char* argv[])
19: {
20: int i;
21: for (;;){
22: forloop();
23: }
24: return 0;
25: }
And here are the stackshots, re-ordered with the most frequent at the top:
forloop() line 12
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 9 bytes
main() line 23
forloop() line 13 + 7 bytes
main() line 23
forloop() line 13 + 3 bytes
main() line 23
forloop() line 6 + 22 bytes
main() line 23
forloop() line 14
main() line 23
forloop() line 7
main() line 23
forloop() line 11 + 9 bytes
main() line 23
What does this tell you? It says that line 12 consumes about 40% of the time, and line 13 consumes about 20% of the time. It also tells you that line 23 consumes nearly 100% of the time.
That means unrolling the loop at line 12 might potentially give you a speedup factor of 100/(100-40) = 100/60 = 1.67x approximately. Of course there are other ways to speed up this code as well, such as by eliminating the inner loop, if you're really trying to compute factorial.
I'm just pointing this out because it's a bone-simple way to do profiling.