Using valgrind to spot error in mpi code

匿名 (未验证) 提交于 2019-12-03 02:23:02

问题:

I have a code which works perfect in serial but with mpirun -n 2 ./out it gives the following error:

./out': malloc(): smallbin double linked list corrupted: 0x00000000024aa090 

I tried to use valgrind such as:

valgrind --leak-check=yes mpirun -n 2 ./out 

I got the following output:

==3494== Memcheck, a memory error detector ==3494== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==3494== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==3494== Command: mpirun -n 2 ./out ==3494==  Grid_0/NACA0012.msh Grid_0/NACA0012.msh >>> Number of cells: 7734 >>> Number of cells: 7734 0.000000  0         1.470622e-02 *** Error in `./out': malloc(): smallbin double linked list corrupted: 0x00000000024aa090 ***  =================================================================================== =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES =   PID 3497 RUNNING AT orhan =   EXIT CODE: 134 =   CLEANING UP REMAINING PROCESSES =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions ==3494==  ==3494== HEAP SUMMARY: ==3494==     in use at exit: 131,120 bytes in 2 blocks ==3494==   total heap usage: 1,064 allocs, 1,062 frees, 231,859 bytes allocated ==3494==  ==3494== LEAK SUMMARY: ==3494==    definitely lost: 0 bytes in 0 blocks ==3494==    indirectly lost: 0 bytes in 0 blocks ==3494==      possibly lost: 0 bytes in 0 blocks ==3494==    still reachable: 131,120 bytes in 2 blocks ==3494==         suppressed: 0 bytes in 0 blocks ==3494== Reachable blocks (those to which a pointer was found) are not shown. ==3494== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==3494==  ==3494== For counts of detected and suppressed errors, rerun with: -v ==3494== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 

I am not good in valgrind but what I understood is valgrind saw no problem. Are there better options for valgrind to spot the source of the specific error mentioned?

回答1:

Note that with the invocation above,

valgrind --leak-check=yes mpirun -n 2 ./out 

you are running valgrind on the program mpirun, which presumably has been extensively tested and works correctly, and not the program ./out, which you know to have a problem.

To run valgrind on your test program you will want to do:

mpirun -n 2 valgrind --leak-check=yes ./out 

Which uses mpirun to launch 2 processes, each running valgrind --leak-check=yes ./out.



回答2:

You can never go wrong with a Jonathan Dursi answer but let me just add that with more than one processor it can be a pain to read valgrind output.

Instead of outputting to the console, dump it to a log file. Of course you can't dump both processes to the same log file. valgrind interprets '%p' as the process id so you get two (or more) log files:

mpiexec -np 2 valgrind --leak-check=full \     --show-reachable=yes --log-file=nc.vg.%p ./noncontig_coll2 -fname blah 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!