How to determine maximum stack usage in embedded system with gcc?

烂漫一生 提交于 2019-11-27 10:56:39

GCC docs:

-fstack-usage

Makes the compiler output stack usage information for the program, on a per-function basis. The filename for the dump is made by appending .su to the auxname. auxname is generated from the name of the output file, if explicitly specified and it is not an executable, otherwise it is the basename of the source file. An entry is made up of three fields:

  • The name of the function.
  • A number of bytes.
  • One or more qualifiers: static, dynamic, bounded.

The qualifier static means that the function manipulates the stack statically: a fixed number of bytes are allocated for the frame on function entry and released on function exit; no stack adjustments are otherwise made in the function. The second field is this fixed number of bytes.

The qualifier dynamic means that the function manipulates the stack dynamically: in addition to the static allocation described above, stack adjustments are made in the body of the function, for example to push/pop arguments around function calls. If the qualifier bounded is also present, the amount of these adjustments is bounded at compile-time and the second field is an upper bound of the total amount of stack used by the function. If it is not present, the amount of these adjustments is not bounded at compile-time and the second field only represents the bounded part.

I can't find any references to -fcallgraph-info

You could potentially create the information you need from -fstack-usage and -fdump-tree-optimized

For each leaf in -fdump-tree-optimized, get its parents and sum their stack size number (keeping in mind that this number lies for any function with "dynamic" but not "bounded") from -fstack-usage, find the max of these values and this should be your maximum stack usage.

Just in case no one comes up with a better answer, I'll post what I had in the comment to your other question, even though I have no experience using these options and tools:

GCC 4.6 adds the -fstack-usage option which gives the stack usage statistics on a function-by-function basis.

If you combine this information with a call graph produced by cflow or a similar tool you can get the kind of stack depth analysis you're looking for (a script could probably be written pretty easily to do this). Have the script read the stack-usage info and load up a map of function names with the stack used by the function. Then have the script walk the cflow graph (which can be an easy-to-parse text tree), adding up the stack usage associated with each line for each branch in the call graph.

So, it looks like this can be done with GCC, but you might have to cobble together the right set of tools.

PeterM

I ended up writing a python script to implement τεκ's answer. It's too much code to post here, but can be found on github

Quite late, but for anyone looking at this, the answers given involving combining the outputs from fstack-usage and call graph tools like cflow can end up being wildly incorrect for any dynamic allocation, even bounded, because there's no information about when that dynamic stack allocation occurs. It's therefore not possible to know to what functions you should apply the value towards. As a contrived example, if (simplified) fstack-usage output is:

main        1024     dynamic,bounded
functionA    512     static
functionB     16     static

and a very simple call tree is:

main
    functionA
    functionB

The naive approach to combine these may result in main -> functionA being chosen as the path of maximum stack usage, at 1536 bytes. But, if the largest dynamic stack allocation in main() is to push a large argument like a record to functionB() directly on the stack in a conditional block that calls functionB (I already said this was contrived), then really main -> functionB is the path of maximum stack usage, at 1040 bytes. Depending on existing software design, and also for other more restricted targets that pass everything on the stack, cumulative errors may quickly lead you toward looking at entirely wrong paths claiming significantly overstated maximum stack sizes.

Also, depending on your classification of "reentrant" when talking about interrupts, it's possible to miss some stack allocations entirely. For instance, many Coldfire processors' level 7 interrupt is edge-sensitive and therefore ignores the interrupt disable mask, so if a semaphore is used to leave the instruction early, you may not consider it reentrant, but the initial stack allocation will still happen before the semaphore is checked.

In short, you have to be extremely careful about using this approach.

I am not familiar with the -fstack-usage and -fcallgraph-info options. However, it is always possible to figure out actual stack usage by:

  1. Allocate adequate stack space (for this experiment), and initialize it to something easily identifiable. I like 0xee.
  2. Run the application and test all its internal paths (by all combinations of input and parameters). Let it run for more than "long enough".
  3. Examine the stack area and see how much of the stack was used.
  4. Make that the stack size, plus 10% or 20% to tolerate software updates and rare conditions.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!