"Generic STM32F0 series" does not run with LTO

Post here all questions related to STM32 core if you can't find a relevant section!
Post Reply
Sonic
Posts: 3
Joined: Mon Oct 12, 2020 8:20 pm

"Generic STM32F0 series" does not run with LTO

Post by Sonic »

Hello everyone! I've got some hardware with 32F030F4P6 (it's ModBus relay module), and i am trying to write my own firmware for the board from scratch. Some "hello world" tests are fine, i am able to blink LEDs, read buttons and click relays. But almost empty application takes approx 10K out of 16K available. When i am adding a modbus client, which uses serial port, the binary size goes over limit.
There is an LTO option, which makes the link working, but something goes wrong. Even with this primitive loop():
void loop() {
digitalWrite(LED, HIGH);
delay(1000);
digitalWrite(LED, LOW);
delay(1000);
}
the LED goes on and never goes off. I suggest that delay() crashes inside. With LTO disabled everything works fine (but i can't use modbus). Could anyone give me hints ?
Another options to reduce binary footprint are also welcome. A good question is: why does empty LED blinker consumes 10 K ?
I've checked the link map in temp directory. It looks like a lot of stuff is pulled in from newlib, including formatted I/O. Why? I am not using it in my application.
mrburnette
Posts: 633
Joined: Thu Dec 19, 2019 1:23 am
Answers: 7

Re: "Generic STM32F0 series" does not run with LTO

Post by mrburnette »

Sonic wrote: Mon Oct 12, 2020 8:29 pm ... hardware with 32F030F4P6 (it's ModBus relay module), and i am trying to write my own firmware for the board from scratch. Some "hello world" tests are fine, i am able to blink LEDs, read buttons and click relays. But almost empty application takes approx 10K out of 16K available. When i am adding a modbus client, which uses serial port, the binary size goes over limit.
...
Another options to reduce binary footprint are also welcome. A good question is: why does empty LED blinker consumes 10 K ?
I've checked the link map in temp directory. It looks like a lot of stuff is pulled in from newlib, including formatted I/O. Why? I am not using it in my application.
LTO == Link Time Optimization

IMO, you need to move away from Arduino. Just try an "empty" sketch and you will realize that init, structures, and pinmaps are going to consume too much flash. Read the woes of this recent poster: viewtopic.php?f=63&t=677

Use the underlying STM technologies: https://github.com/stm32duino/Arduino_C ... troduction without the Arduino wrappers.

Ray
.rpv
Posts: 43
Joined: Wed Dec 18, 2019 10:19 pm

Re: "Generic STM32F0 series" does not run with LTO

Post by .rpv »

By default the HAL config loads everything and the modules are a bit heavy, especially for MCU's with low flash. The ST core allows you to change the default HAL config by creating an "hal_conf_extra.h" file on the same folder as you sketch and there you can disable everything that you don't need/want:

Code: Select all

#define HAL_ADC_MODULE_DISABLED
#define HAL_I2C_MODULE_DISABLED
#define HAL_RTC_MODULE_DISABLED
#define HAL_SPI_MODULE_DISABLED
#define HAL_TIM_MODULE_DISABLED
#define HAL_DAC_MODULE_DISABLED
#define HAL_EXTI_MODULE_DISABLED
#define HAL_ETH_MODULE_DISABLED
#define HAL_SD_MODULE_DISABLED
#define HAL_QSPI_MODULE_DISABLED
Also, on the arduino ide menu set the UART to disabled if you aren't using serial and this will save you a bit more flash.

Using this will make that using something like stm32cube ide and arduino ide the same regarding to flash usage.
User avatar
fpiSTM
Posts: 1738
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 91
Location: Le Mans
Contact:

Re: "Generic STM32F0 series" does not run with LTO

Post by fpiSTM »

Used example Blink.ino

Default options
Sketch uses 9664 bytes (58%) of program storage space. Maximum is 16384 bytes.
Global variables use 876 bytes (21%) of dynamic memory, leaving 3220 bytes for local variables. Maximum is 4096 bytes.
U(S)ART support disabled
Sketch uses 7092 bytes (43%) of program storage space. Maximum is 16384 bytes.
Global variables use 552 bytes (13%) of dynamic memory, leaving 3544 bytes for local variables. Maximum is 4096 bytes.
U(S)ART support disabled + below HAL module disabled ([ur=lhttps://github.com/stm32duino/wiki/wiki/HAL-configuration#list-of-hal__module_disabled-definition]WiKi[/url]):

Code: Select all

#define HAL_ADC_MODULE_DISABLED
#define HAL_I2C_MODULE_DISABLED
#define HAL_RTC_MODULE_DISABLED
#define HAL_SPI_MODULE_DISABLED
#define HAL_TIM_MODULE_DISABLED
#define HAL_DAC_MODULE_DISABLED
#define HAL_EXTI_MODULE_DISABLED
#define HAL_ETH_MODULE_DISABLED
#define HAL_SD_MODULE_DISABLED
#define HAL_QSPI_MODULE_DISABLED 
Sketch uses 3764 bytes (22%) of program storage space. Maximum is 16384 bytes.
Global variables use 64 bytes (1%) of dynamic memory, leaving 4032 bytes for local variables. Maximum is 4096 bytes.
Sonic
Posts: 3
Joined: Mon Oct 12, 2020 8:20 pm

Re: "Generic STM32F0 series" does not run with LTO

Post by Sonic »

Tried these #define's (except TIM because it's used by something) together with USART enabled (because my project uses serial port). This gave me only 500 bytes for some reason.
UPD: Tried the same on a very early version of the sketch, which does almost nothing and lets me to disable serial port completely. TIM is apparently used by millis(). Still the same difference of about 500 bytes.
I wonder, doesn't it point out that code breakdown is bad, and pulling on one function from an .a file pulls in the whole object with lots of potentially unused stuff ?
Well, looks like it's time to give up Arduino indeed. A pity, i like the little operating system it provides.
Last edited by Sonic on Tue Oct 13, 2020 6:14 pm, edited 1 time in total.
Sonic
Posts: 3
Joined: Mon Oct 12, 2020 8:20 pm

Re: "Generic STM32F0 series" does not run with LTO

Post by Sonic »

mrburnette wrote: Mon Oct 12, 2020 9:23 pm
LTO == Link Time Optimization
Thank you very much, i know what it is :), but isn't it supposed to still produce working binaries? Or does it choke on weak symbols or some other tricks like that ?
mrburnette
Posts: 633
Joined: Thu Dec 19, 2019 1:23 am
Answers: 7

Re: "Generic STM32F0 series" does not run with LTO

Post by mrburnette »

Sonic wrote: Tue Oct 13, 2020 6:09 pm
mrburnette wrote: Mon Oct 12, 2020 9:23 pm
LTO == Link Time Optimization
Thank you very much, i know what it is :), but isn't it supposed to still produce working binaries? Or does it choke on weak symbols or some other tricks like that ?
https://interrupt.memfault.com/blog/cod ... -gcc-flags
The linker on the other hand has visibility into our whole program, so this is the stage where dead code could be identified and removed. Unfortunately, linkers do not perform optimizations by default.

To enable dead code optimization on GCC, you need two things: the compiler needs to split each function into its own linker section so the linker knows where each function is, and the linker needs to add an optimization pass to remove sections that are not called by anything.

This is achieved with the -ffunction-sections compile-time flag and the -gc-sections link-time flag. A similar process can take place with dead data and the -fdata-sections flag.
LTO works differently:
https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
At the highest level, LTO splits the compiler in two. The first half (the “writer”) produces a streaming representation of all the internal data structures needed to optimize and generate code. This includes declarations, types, the callgraph and the GIMPLE representation of function bodies.

When -flto is given during compilation of a source file, the pass manager executes all the passes in all_lto_gen_passes. Currently, this phase is composed of two IPA passes:
  • pass_ipa_lto_gimple_out This pass executes the function lto_output in lto-streamer-out.c, which traverses the call graph encoding every reachable declaration, type and function. This generates a memory representation of all the file sections described below.
  • pass_ipa_lto_finish_out This pass executes the function produce_asm_for_decls in lto-streamer-out.c, which takes the memory image built in the previous pass and encodes it in the corresponding ELF file sections.
The second half of LTO support is the “reader”. This is implemented as the GCC front end lto1 in lto/lto.c. When collect2 detects a link set of .o/.a files with LTO information and the -flto is enabled, it invokes lto1 which reads the set of files and aggregates them into a single translation unit for optimization. The main entry point for the reader is lto/lto.c:lto_main.
Our member feluga has spent lots of time experimenting with optimizations: viewtopic.php?p=2263#p2263
- Using STM32 Official Core, this same example uses 2.4K RAM and 12.7 K Flash (lowest footprint). Which is very good for adding VGA to your project. Perhaps some techniques would be applicable to your needs.

This may be too basic for you: https://create.arduino.cc/projecthub/jo ... age-26ca05
Last edited by mrburnette on Tue Oct 13, 2020 7:43 pm, edited 1 time in total.
User avatar
fpiSTM
Posts: 1738
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 91
Location: Le Mans
Contact:

Re: "Generic STM32F0 series" does not run with LTO

Post by fpiSTM »

I do not advise LTO usage, I saw too much issue using it and several issue opened in gcc about this.
User can try it and if it works then it is fine but you ar on your own for this.
.rpv
Posts: 43
Joined: Wed Dec 18, 2019 10:19 pm

Re: "Generic STM32F0 series" does not run with LTO

Post by .rpv »

Sonic wrote: Tue Oct 13, 2020 6:04 pm Tried these #define's (except TIM because it's used by something) together with USART enabled (because my project uses serial port). This gave me only 500 bytes for some reason.
UPD: Tried the same on a very early version of the sketch, which does almost nothing and lets me to disable serial port completely. TIM is apparently used by millis(). Still the same difference of about 500 bytes.
I wonder, doesn't it point out that code breakdown is bad, and pulling on one function from an .a file pulls in the whole object with lots of potentially unused stuff ?
Well, looks like it's time to give up Arduino indeed. A pity, i like the little operating system it provides.
When I disabled the timer module (HAL_TIM_MODULE_DISABLED) millis works but the PWM didn't, I got no errors on the compiler the PWM channels just won't work, also when you creates/edit the hal_conf_extra.h file I noticed that sometimes you need to restart arduino ide to see the changes applied on the module loading/disabling.
User avatar
fpiSTM
Posts: 1738
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 91
Location: Le Mans
Contact:

Re: "Generic STM32F0 series" does not run with LTO

Post by fpiSTM »

PWM uses time. If you use analogWrite then if the pin has DAC capability it is used else if pin has Timer the Timer is used else simply GPIO toggle is used.
For arduino restart this is due to the cache management from Arduino which does not rebuild all.
Post Reply

Return to “General discussion”