LTO isn't working?

Post here all questions related to STM32 core if you can't find a relevant section!
Post Reply
ikeji
Posts: 5
Joined: Fri Dec 27, 2019 6:50 pm

LTO isn't working?

Post by ikeji »

I'm trying to measure how first the STM32 compare to AVR based arduino.
But my code doesn't work with LTO enabled compile.
Is there any well known problem for LTO?

First, I wrote this code for Arduino.

Code: Select all

void setup() {
  pinMode(13, OUTPUT);
}
void loop() {
  while(1) {
    digitalWrite(13, HIGH);
    digitalWrite(13, LOW);
    digitalWrite(13, HIGH);
    digitalWrite(13, LOW);
  }
}
This outputs 100khz square wave (measured by oscilloscope)

Next, I wrote this code for Arduino too.

Code: Select all

void setup() {
  pinMode(13, OUTPUT);
}
void loop() {
  while(1) {
    PORTC |= B10000000;
    PORTC &= B01111111;
    PORTC |= B10000000;
    PORTC &= B01111111;
  }
}
This code outputs 4Mhz square wave.

For STM32, I wrote this code.

Code: Select all

void setup() {
  pinMode(PB8, OUTPUT);
}
void loop() {
  while(1) {
    digitalWrite(PB8,HIGH);
    digitalWrite(PB8,LOW);
    digitalWrite(PB8,HIGH);
    digitalWrite(PB8,LOW);
  }
}
My bluepill board(STM32F103) outputs 500khz square wave if I select O3(without LTO).
I believe this code run more fast with O3 with LTO.
But it's doesn't output any signal.

I tried to debug with STLink and GDB.
The chip seems entered infinite loop in ADC1_2_IRQHandler function.
I'm not sure why this interrupt is enabled.

I use arduino ide 1.8.10 with STM32duino 1.8.0 in debian laptop.
I use bootloader in here.
https://github.com/rogerclarkmelbourne/ ... bootloader

Thanks,
ag123
Posts: 602
Joined: Thu Dec 19, 2019 5:30 am
Answers: 2

Re: LTO isn't working?

Post by ag123 »

do LTO refer to this?
https://gcc.gnu.org/onlinedocs/gccint/LTO.html

i think more commonly the flags used are -Os optimize size, occasionally one can try -O2 etc but i'd guess the binary would be fatter
videos like this can be found on youtube
https://www.youtube.com/watch?v=5mDnKBNl9sY
https://www.youtube.com/watch?v=_SmLlGAyRRc

in some different stm32 mcu e.g. stm32f407, there are attempts to try the FPU, and it is noticed that if the fpu algorithms are placed in the same sketch a hardfault occurs. placing them in a different .cpp file resolves the hard fault. using features like LTO i'm unsure if it may accidentally introduce codes that causes hardfaults
found this article in a search
https://stackoverflow.com/questions/539 ... nctions-ho
ikeji
Posts: 5
Joined: Fri Dec 27, 2019 6:50 pm

Re: LTO isn't working?

Post by ikeji »

Thanks for reply.
ag123 wrote:
Sat Dec 28, 2019 1:37 am
do LTO refer to this?
https://gcc.gnu.org/onlinedocs/gccint/LTO.html
Yes, that LTO. Sorry for confuse.
LTO.png
LTO.png (71.94 KiB) Viewed 2522 times
I'm using ArduinoIDE and I select it from Tools -> Optimize menu.
The menu have -Os, -O1, -O2, -O3, -g and "with LTO" version of all of it.

My code works with -Os, but "Smallest (-Os) with LTO" isn't work.
And -O3 (without LTO) is still slower than AVR code.
ag123 wrote:
Sat Dec 28, 2019 1:37 am
in some different stm32 mcu e.g. stm32f407, there are attempts to try the FPU, and it is noticed that if the fpu algorithms are placed in the same sketch a hardfault occurs. placing them in a different .cpp file resolves the hard fault. using features like LTO i'm unsure if it may accidentally introduce codes that causes hardfaults
I think my code doesn't use floating points.
ag123 wrote:
Sat Dec 28, 2019 1:37 am
found this article in a search
https://stackoverflow.com/questions/539 ... nctions-ho
The stack overflow post seems related to me.
But I can't found the other definition of ADC1_2_IRQHandler in STM32duino core code.
Does it have one and removed by LTO?

Thanks,
ag123
Posts: 602
Joined: Thu Dec 19, 2019 5:30 am
Answers: 2

Re: LTO isn't working?

Post by ag123 »

using the Arduino API

Code: Select all

digitalWrite();
is likely slower than accessing registers directly, this in part as it is made up of function calls and that codes run from flash with all the stack ops etc it would take cycles and wait states to complete that

things that may be tried may be something like

Code: Select all

uint8_t pin=8;
GPIOB->ODR |= 1ul << pin;
not tested you would need to review if it is after all correctly specified
the rm0008 stm32f103 reference manual would be a document to keep handy.
https://www.st.com/content/ccc/resource ... 171190.pdf

doing that would be faster than calling digitalWrite(), but if you need higher speeds, stm32 has quite a lot of hardware peripherals such as timers that you could explore and try to use them. my guess is if you can use timers and the waveform is simple, e.g. simple pwm square waves, timers can do that and you can probably go up to mhz in frequencies

for that adc interrupt, normally i think the NVIC interrupt table needs to be setup at boot/reset. so it is quite likely using LTO may have caused the problem. e.g. that building with LTO removed some table entries and caused hard faults
http://infocenter.arm.com/help/topic/co ... 01s01.html
ikeji
Posts: 5
Joined: Fri Dec 27, 2019 6:50 pm

Re: LTO isn't working?

Post by ikeji »

Thanks for reply.
Seems like LTO is too advanced. I'll disable LTO for my project.
ag123 wrote:
Sat Dec 28, 2019 8:57 am
using the Arduino API

Code: Select all

digitalWrite();
is likely slower than accessing registers directly, this in part as it is made up of function calls and that codes run from flash with all the stack ops etc it would take cycles and wait states to complete that

things that may be tried may be something like

Code: Select all

uint8_t pin=8;
GPIOB->ODR |= 1ul << pin;
not tested you would need to review if it is after all correctly specified
the rm0008 stm32f103 reference manual would be a document to keep handy.
https://www.st.com/content/ccc/resource ... 171190.pdf
I thought LTO will do this for me.

Code: Select all

 8006fb4:       611a            str     r2, [r3, #16]
 8006fb6:       615a            str     r2, [r3, #20]
 8006fb8:       611a            str     r2, [r3, #16]
 8006fba:       615a            str     r2, [r3, #20]
 8006fbc:       e7fa            b.n     8006fb4 <main+0x68>
O3+LTO actually generate code like this. (but not work because interrupt vector is broken)
It should be same as access register directory.
ag123 wrote:
Sat Dec 28, 2019 8:57 am
doing that would be faster than calling digitalWrite(), but if you need higher speeds, stm32 has quite a lot of hardware peripherals such as timers that you could explore and try to use them. my guess is if you can use timers and the waveform is simple, e.g. simple pwm square waves, timers can do that and you can probably go up to mhz in frequencies
Yes, current code is just benchmark.
ag123 wrote:
Sat Dec 28, 2019 8:57 am
for that adc interrupt, normally i think the NVIC interrupt table needs to be setup at boot/reset. so it is quite likely using LTO may have caused the problem. e.g. that building with LTO removed some table entries and caused hard faults
http://infocenter.arm.com/help/topic/co ... 01s01.html
Yea, I think enabling LTO is too hard to me....
User avatar
fpiSTM
Posts: 913
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 36
Location: Le Mans
Contact:

Re: LTO isn't working?

Post by fpiSTM »

LTO is mainly for size.
The LTO option is available but I do not advise to use it, as I've met several time issue which are mainly linked to gcc toolchain. :roll:

You can optimize your code using several way: direct access register, LL GPIO,...

Try this:

Code: Select all

void setup() {
  pinMode(PB8, OUTPUT);
}
void loop() {
  while(1) {
    digitalWriteFast(PB_8, HIGH);
    digitalWriteFast(PB_8, LOW);
    digitalWriteFast(PB_8, HIGH);
    digitalWriteFast(PB_8, LOW);
  }
}
or

Code: Select all

void setup() {
  pinMode(PB8, OUTPUT);
}
void loop() {
  while(1) {
    digitalToggleFast(PB_8);
  }
}
User avatar
Bakisha
Posts: 72
Joined: Fri Dec 20, 2019 6:50 pm
Answers: 4
Contact:

Re: LTO isn't working?

Post by Bakisha »

In my experience, since I have some project that include both interrupts and timer, LTO was working on 1.7.0 core version. You try to downgrade from board manager and test it. And, for me, O3 with LTO produce much faster code then with just O3 (when measured with logic analizer).
ikeji
Posts: 5
Joined: Fri Dec 27, 2019 6:50 pm

Re: LTO isn't working?

Post by ikeji »

Thanks all.
I tested more.

First I tested digitalWriteFast and digitalToggleFast.
For this case digitalWriteFast is faster than digitalToggleFast.
digitalWriteFast is finish toggle on 26ns.
I think this is same speed as direct access register.

I also tested 1.7.0. but I couldn't run O3+LTO option because it generate larger binary than my chip.
Os+LTO is 2x faster than just Os, but digitalWriteFast is faster.

Here is all results:

Code: Select all

Chip     Method               time
Arduino  digitalWrite       5000ns
Arduino  Register            125ns
STM32    digitalWrite        900ns
STM32    digitalWriteFast     26ns
STM32    digitalToggleFast   152ns
STM32    Os+LTO              360ns
Here is source code, disassembled code and othlloscope screenshots.
https://blog.ikejima.org/make/8088/2020 ... -pc-8.html

Thanks
User avatar
fpiSTM
Posts: 913
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 36
Location: Le Mans
Contact:

Re: LTO isn't working?

Post by fpiSTM »

digitalToggleFast is lower because it read the register before invert it's value.
Post Reply

Return to “General discussion”