问题
I'm quite lost on this one and I hope someone here could help.
My application consists of hundreds of functions evaluating numerical code (source is in the 5MB range each) and I manage the functions with a std::map
to function pointers.
What apparently happens is that I get a stack overflow when trying to pass an argument to one of the functions, accessed by a pointer to it:
gdb output:
Program received signal SIGSEGV, Segmentation fault.
0x0000000001ec0df7 in xsectiond149 (sme=Cannot access memory at address 0x7fffff34b888
) at xsection149.c:2
2 Poly3 xsectiond149(std::tr1::unordered_map<int, Poly3> & sme,
EvaluationNode::Ptr ti[], ProcessVars & s)
and xsection149.c:2 has only the opening brace for the definition of the function.
/proc/<pid>/map
for the process shows for the address range closest to the address that triggers the error only this line:
7ffffff74000-7ffffffff000 rw-p 7ffffff73000 00:00 0 [stack]
so the address in the above error is out of bounds.
Now my question: How do I resolve this problem? I can not wrap my head around as to what I could allocate on the heap...
The only think that happens in my main routine is:
// A map containing O(10^4) Poly3 (struct with 6 doubles)
tr1::unordered_map<int, Poly3> smetemp;
// populates smetemp
computeSMEs(smetemp);
// Map of function pointers of type, O(10^3) elements
tr1::unordered_map<int, xsdptr> diagfunctions = get_diagram_map();
How could this overflow the stack??
EDIT: I've tried to run it in valgrind, this is the error I get and google didn't give any meaningful info:
valgrind: m_debuginfo/storage.c:417 (vgModuleLocal_addDiCfSI):
Assertion 'cfsi.len < 5000000' failed.
==491== at 0x38029D5C: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
EDIT2: Disassembling the function up to the point where it fails (0x0000000001ec0df7) gives me:
Dump of assembler code for function xsectiond149(std::tr1::unordered_map<int, Poly3, std::tr1::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, Poly3> >, false>&, std::vector<boost::shared_ptr<EvaluationNode>, std::allocator<boost::shared_ptr<EvaluationNode> > >&, ProcessVars&):
<...+0>: push %rbp
<...+1>: mov %rsp,%rbp
<...+4>: push %r15
<...+6>: push %r14
<...+8>: push %r13
<...+10>: push %r12
<...+12>: push %rbx
<...+13>: sub $0xc96b58,%rsp
<...+20>: mov %rdi,%rbx
<...+23>: mov %rsi,-0xc8b078(%rbp) // this instr fails
and the first few lines of the function read:
Poly3 xsectiond149(std::tr1::unordered_map<int, Poly3> & sme,
std::vector<EvaluationNode::Ptr> & ti,
ProcessVars & s)
{
Poly3 sum(0,0,0,-2);
Poly3 prefactor, expr;
// CF*CA^2*NF*NA^(-2)
double col0 = 0.5625000000000000000000000000;
prefactor = col0*ti[0]->value()*s.Qtpow2*s.epow2*s.gpow6;
expr = (128*(s.p1p2*sme[192]*s.mt - s.p1p2*sme[193]*s.mt +
1/2.*s.p1p2*sme[195]*s.mt - 1/2.*s.p1p2*sme[196]*s.mt -
s.p1p2*sme[201]*s.mt + s.p1p2*sme[202]*s.mt +
1/2.*s.p1p2*sme[210]*s.mt - 1/2.*s.p1p2*sme[211]*s.mt -
1/4.*s.p1p2*sme[216]*s.mt + 1/4.*s.p1p2*sme[217]*s.mt -
s.p1p2*sme[219]*s.mt + s.p1p2*sme[220]*s.mt -
1/8.*s.p1p2*sme[1209]*s.mt + 1/8.*s.p1p2*sme[1210]*s.mt +
1/2.*s.p1p2*sme[1215]*s.mt - 1/2.*s.p1p2*sme[1216]*s.mt +
// .....
}
(Note that I have changed the signature of the function during experimentation)
Can anyone make the ends meet to what is going on here? Which additional information would you need? Sorry, but I have almost no experience with asm.
EDIT3:
Increasing the stack size with ulimit -s <size>
did the trick. Thank you all for your help!
回答1:
It looks like the function xsectiond149
needs a stack frame of about 13 MB (note the instruction sub $0xc96b58,%rsp
, and the failure as soon as it tries to write something down there two instructions later). You need to ensure that the thread has a large enough stack (by default it won't) before calling the function.
You might also look into changing your code generator to allocate more stuff on the heap instead of the stack.
回答2:
Get Valgrind and run your program under Valgrind (using memcheck, the default tool) once built. This way you will have it much easier to locate the source of the fault.
You can also run Valgrind in a mode where it breaks into the debugger (usually GDB) and you can then use all the cool GDB commands to inspect values at stack frames of callers and so on.
Either way, if you are stuck, Valgrind should help you find some pointers where to continue.
As to your edit, here's my reply (quoting from the Valgrind source code, r11604 of storage.c):
445 /* sanity */
446 vg_assert(cfsi.len > 0);
447 /* If this fails, the implication is you have a single procedure
448 with more than 5 million bytes of code. Which is pretty
449 unlikely. Either that, or the debuginfo reader is somehow
450 broken. 5 million is of course arbitrary; but it's big enough
451 to be bigger than the size of any plausible piece of code that
452 would fall within a single procedure. */
453 vg_assert(cfsi.len < 5000000);
来源:https://stackoverflow.com/questions/6084901/stackoverflow-and-function-pointers