Linking with static library not equivalent to linking with its objects

核能气质少年 提交于 2019-12-04 12:58:44

Summary:

The issue was that not all objects from the static library were being included in the firmware image. This is solved by surrounding the static library with the --whole-archive and --no-whole-archive linker flags:

 $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) -Wl,--whole-archive Library/libtest.a -Wl,--no-whole-archive

The issue arises because if the linker includes a library object with weak symbol definitions, it considers these symbols defined, and no longer searches for their (strong) definitions. Hence the object with strong definitions may or may not be included, depending on search order and what other symbols it defines.

Solution path:

Using arm-none-eabi-gdb to debug, it appeared that the disabled WWDG interrupt was occurring and calling the Default_Handler. This turned out to be a red herring... which has occured often enough that it led me to the answer via the "STM32 WWDG interrupt firing when not configured" stackoverflow post.

Upon reading this post and learning that the gdb function name reporting is often inaccurate for functions that share the same memory address, I checked the generated .map file for the faulty firmware image and confirmed that the WWDG_IRQHandler was located at the same memory address as the majority of IRQHandlers including the IRQHandlers for interrupts that are defined and used by the system (eg. some timer interrupts).

Furthermore, all interrupts defined in the stm32f4xx_it.o object (which defines the IRQHandlers for interrupts used by the system, and which is included in the static library) pointed to the memory address of the Default_Handler, and the respective IRQHandler symbols were listed as being supplied by startup_stm32f407xx.o.

I then checked which object files were actually linked into the firmware image (perl -n -e '/libtest\.a\((.*?)\)/ && print "$1\n"' app.map | sort -u) and found that only a subset of objects were linked.

Further inspection of startup_stm32f407xx.s showed that it defines many weak symbols, eg:

.weak TIM2_IRQHandler

During the process of linking a static library, the linker searches the library for undefined symbols and includes the first object it finds to define these symbols. It then removes the symbol from the undefined list, as well as any other undefined symbols that are defined by the included object.

My guess as to what happened is that the linker found an otherwise-undefined symbol in startup_stm32f407xx.o and included the object. It considered all IRQHandler symbols to be defined by the weak definitions therein. The object stm32f4xx_it.o was never included since it did not define any undefined symbols. This happened a number of times, with a number of different object files; sometimes the strong symbols were included, sometimes the weak symbols were included, depending on which object was searched first. Interesting (yet unsurprising) is that if the weak definition is removed, the object containing the strong definition is included, and all strong definitions from that file (correctly) override the already-included weak definitions.

Having solved the problem, I'm not sure where to go from here. Is this a linker bug?

You'll get a better answer if you can explain what "the binary doesn't work" really means.

Are you getting a binary that your programming tools won't load into the chip at all?

If so, look carefully at linker output on the command line.

Are you producing something you can load into the chip and not seeing the expected behavior?

If so, use a hardware debugger. Step through the code until something breaks, or let it run, then halt it and see where you ended up.

Chances are, you're just uncovering a bug that's always been in the code by rearranging where everything goes in memory. Array overflows, bad pointer dereferences, and uninitialized variables are typical culprits. Switching on -Wextra and -Wall can help uncover this stuff.

One other thought: Make sure you're LDSCRIPT has the correct flash & RAM sizes for the actual part number (i.e. is not for a different part in the family).

I also work currently with that MCU. However, I avoid the ST "standard" library for good reasons.

It looks as if the watchdog has been enabled during startup and does expire soon (the interrupt is an early warning. This may be due to variations in run-time behaviour. This might very well vary depending on linkage due to trampolines created by the linker and/or tink-time optimization (LTO) and inlining by the compiler and other optimizations.

The sizes given seem to be out-of-bounds for normal variation with identical compile/link options. But they are very well possible for -Os vs. -O3 and LTO/no LTO (whereas for the latter the resulting code size may be very well larger or smaller, depending on -O). Also, I noticed some gcc/ld version have problems with LTO and all code has to be compiled&linked(!) with the same options. Also check the ABI used and that it matches the (C- and gcc-libs used.

A good start would be to coarse-step through startup from reset with a watchpoint at WWDG->CR. Also check the EWI-bit; this would actually allow the interrupt.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!