"Generic STM32F0 series" does not run with LTO
"Generic STM32F0 series" does not run with LTO
Hello everyone! I've got some hardware with 32F030F4P6 (it's ModBus relay module), and i am trying to write my own firmware for the board from scratch. Some "hello world" tests are fine, i am able to blink LEDs, read buttons and click relays. But almost empty application takes approx 10K out of 16K available. When i am adding a modbus client, which uses serial port, the binary size goes over limit.
There is an LTO option, which makes the link working, but something goes wrong. Even with this primitive loop():
void loop() {
digitalWrite(LED, HIGH);
delay(1000);
digitalWrite(LED, LOW);
delay(1000);
}
the LED goes on and never goes off. I suggest that delay() crashes inside. With LTO disabled everything works fine (but i can't use modbus). Could anyone give me hints ?
Another options to reduce binary footprint are also welcome. A good question is: why does empty LED blinker consumes 10 K ?
I've checked the link map in temp directory. It looks like a lot of stuff is pulled in from newlib, including formatted I/O. Why? I am not using it in my application.
There is an LTO option, which makes the link working, but something goes wrong. Even with this primitive loop():
void loop() {
digitalWrite(LED, HIGH);
delay(1000);
digitalWrite(LED, LOW);
delay(1000);
}
the LED goes on and never goes off. I suggest that delay() crashes inside. With LTO disabled everything works fine (but i can't use modbus). Could anyone give me hints ?
Another options to reduce binary footprint are also welcome. A good question is: why does empty LED blinker consumes 10 K ?
I've checked the link map in temp directory. It looks like a lot of stuff is pulled in from newlib, including formatted I/O. Why? I am not using it in my application.
-
- Posts: 633
- Joined: Thu Dec 19, 2019 1:23 am
Re: "Generic STM32F0 series" does not run with LTO
LTO == Link Time OptimizationSonic wrote: Mon Oct 12, 2020 8:29 pm ... hardware with 32F030F4P6 (it's ModBus relay module), and i am trying to write my own firmware for the board from scratch. Some "hello world" tests are fine, i am able to blink LEDs, read buttons and click relays. But almost empty application takes approx 10K out of 16K available. When i am adding a modbus client, which uses serial port, the binary size goes over limit.
...
Another options to reduce binary footprint are also welcome. A good question is: why does empty LED blinker consumes 10 K ?
I've checked the link map in temp directory. It looks like a lot of stuff is pulled in from newlib, including formatted I/O. Why? I am not using it in my application.
IMO, you need to move away from Arduino. Just try an "empty" sketch and you will realize that init, structures, and pinmaps are going to consume too much flash. Read the woes of this recent poster: viewtopic.php?f=63&t=677
Use the underlying STM technologies: https://github.com/stm32duino/Arduino_C ... troduction without the Arduino wrappers.
Ray
Re: "Generic STM32F0 series" does not run with LTO
By default the HAL config loads everything and the modules are a bit heavy, especially for MCU's with low flash. The ST core allows you to change the default HAL config by creating an "hal_conf_extra.h" file on the same folder as you sketch and there you can disable everything that you don't need/want:
Also, on the arduino ide menu set the UART to disabled if you aren't using serial and this will save you a bit more flash.
Using this will make that using something like stm32cube ide and arduino ide the same regarding to flash usage.
Code: Select all
#define HAL_ADC_MODULE_DISABLED
#define HAL_I2C_MODULE_DISABLED
#define HAL_RTC_MODULE_DISABLED
#define HAL_SPI_MODULE_DISABLED
#define HAL_TIM_MODULE_DISABLED
#define HAL_DAC_MODULE_DISABLED
#define HAL_EXTI_MODULE_DISABLED
#define HAL_ETH_MODULE_DISABLED
#define HAL_SD_MODULE_DISABLED
#define HAL_QSPI_MODULE_DISABLED
Using this will make that using something like stm32cube ide and arduino ide the same regarding to flash usage.
Re: "Generic STM32F0 series" does not run with LTO
Used example Blink.ino
Default options
Default options
U(S)ART support disabledSketch uses 9664 bytes (58%) of program storage space. Maximum is 16384 bytes.
Global variables use 876 bytes (21%) of dynamic memory, leaving 3220 bytes for local variables. Maximum is 4096 bytes.
U(S)ART support disabled + below HAL module disabled ([ur=lhttps://github.com/stm32duino/wiki/wiki/HAL-configuration#list-of-hal__module_disabled-definition]WiKi[/url]):Sketch uses 7092 bytes (43%) of program storage space. Maximum is 16384 bytes.
Global variables use 552 bytes (13%) of dynamic memory, leaving 3544 bytes for local variables. Maximum is 4096 bytes.
Code: Select all
#define HAL_ADC_MODULE_DISABLED
#define HAL_I2C_MODULE_DISABLED
#define HAL_RTC_MODULE_DISABLED
#define HAL_SPI_MODULE_DISABLED
#define HAL_TIM_MODULE_DISABLED
#define HAL_DAC_MODULE_DISABLED
#define HAL_EXTI_MODULE_DISABLED
#define HAL_ETH_MODULE_DISABLED
#define HAL_SD_MODULE_DISABLED
#define HAL_QSPI_MODULE_DISABLED
Sketch uses 3764 bytes (22%) of program storage space. Maximum is 16384 bytes.
Global variables use 64 bytes (1%) of dynamic memory, leaving 4032 bytes for local variables. Maximum is 4096 bytes.
Re: "Generic STM32F0 series" does not run with LTO
Tried these #define's (except TIM because it's used by something) together with USART enabled (because my project uses serial port). This gave me only 500 bytes for some reason.
UPD: Tried the same on a very early version of the sketch, which does almost nothing and lets me to disable serial port completely. TIM is apparently used by millis(). Still the same difference of about 500 bytes.
I wonder, doesn't it point out that code breakdown is bad, and pulling on one function from an .a file pulls in the whole object with lots of potentially unused stuff ?
Well, looks like it's time to give up Arduino indeed. A pity, i like the little operating system it provides.
UPD: Tried the same on a very early version of the sketch, which does almost nothing and lets me to disable serial port completely. TIM is apparently used by millis(). Still the same difference of about 500 bytes.
I wonder, doesn't it point out that code breakdown is bad, and pulling on one function from an .a file pulls in the whole object with lots of potentially unused stuff ?
Well, looks like it's time to give up Arduino indeed. A pity, i like the little operating system it provides.
Last edited by Sonic on Tue Oct 13, 2020 6:14 pm, edited 1 time in total.
Re: "Generic STM32F0 series" does not run with LTO
Thank you very much, i know what it is

-
- Posts: 633
- Joined: Thu Dec 19, 2019 1:23 am
Re: "Generic STM32F0 series" does not run with LTO
https://interrupt.memfault.com/blog/cod ... -gcc-flagsSonic wrote: Tue Oct 13, 2020 6:09 pmThank you very much, i know what it is, but isn't it supposed to still produce working binaries? Or does it choke on weak symbols or some other tricks like that ?
LTO works differently:The linker on the other hand has visibility into our whole program, so this is the stage where dead code could be identified and removed. Unfortunately, linkers do not perform optimizations by default.
To enable dead code optimization on GCC, you need two things: the compiler needs to split each function into its own linker section so the linker knows where each function is, and the linker needs to add an optimization pass to remove sections that are not called by anything.
This is achieved with the -ffunction-sections compile-time flag and the -gc-sections link-time flag. A similar process can take place with dead data and the -fdata-sections flag.
https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
Our member feluga has spent lots of time experimenting with optimizations: viewtopic.php?p=2263#p2263At the highest level, LTO splits the compiler in two. The first half (the “writer”) produces a streaming representation of all the internal data structures needed to optimize and generate code. This includes declarations, types, the callgraph and the GIMPLE representation of function bodies.
When -flto is given during compilation of a source file, the pass manager executes all the passes in all_lto_gen_passes. Currently, this phase is composed of two IPA passes:
- pass_ipa_lto_gimple_out This pass executes the function lto_output in lto-streamer-out.c, which traverses the call graph encoding every reachable declaration, type and function. This generates a memory representation of all the file sections described below.
The second half of LTO support is the “reader”. This is implemented as the GCC front end lto1 in lto/lto.c. When collect2 detects a link set of .o/.a files with LTO information and the -flto is enabled, it invokes lto1 which reads the set of files and aggregates them into a single translation unit for optimization. The main entry point for the reader is lto/lto.c:lto_main.
- pass_ipa_lto_finish_out This pass executes the function produce_asm_for_decls in lto-streamer-out.c, which takes the memory image built in the previous pass and encodes it in the corresponding ELF file sections.
- Using STM32 Official Core, this same example uses 2.4K RAM and 12.7 K Flash (lowest footprint). Which is very good for adding VGA to your project. Perhaps some techniques would be applicable to your needs.
This may be too basic for you: https://create.arduino.cc/projecthub/jo ... age-26ca05
Last edited by mrburnette on Tue Oct 13, 2020 7:43 pm, edited 1 time in total.
Re: "Generic STM32F0 series" does not run with LTO
I do not advise LTO usage, I saw too much issue using it and several issue opened in gcc about this.
User can try it and if it works then it is fine but you ar on your own for this.
User can try it and if it works then it is fine but you ar on your own for this.
Re: "Generic STM32F0 series" does not run with LTO
When I disabled the timer module (HAL_TIM_MODULE_DISABLED) millis works but the PWM didn't, I got no errors on the compiler the PWM channels just won't work, also when you creates/edit the hal_conf_extra.h file I noticed that sometimes you need to restart arduino ide to see the changes applied on the module loading/disabling.Sonic wrote: Tue Oct 13, 2020 6:04 pm Tried these #define's (except TIM because it's used by something) together with USART enabled (because my project uses serial port). This gave me only 500 bytes for some reason.
UPD: Tried the same on a very early version of the sketch, which does almost nothing and lets me to disable serial port completely. TIM is apparently used by millis(). Still the same difference of about 500 bytes.
I wonder, doesn't it point out that code breakdown is bad, and pulling on one function from an .a file pulls in the whole object with lots of potentially unused stuff ?
Well, looks like it's time to give up Arduino indeed. A pity, i like the little operating system it provides.
Re: "Generic STM32F0 series" does not run with LTO
PWM uses time. If you use analogWrite then if the pin has DAC capability it is used else if pin has Timer the Timer is used else simply GPIO toggle is used.
For arduino restart this is due to the cache management from Arduino which does not rebuild all.
For arduino restart this is due to the cache management from Arduino which does not rebuild all.