Page 2 of 9

Re: Bluepill F4 board, anyone still working on it?

Posted: Tue Dec 24, 2019 2:48 pm
by Pito
on stm32f407, pito achieved 500 mflops on an mcu overclocked to 250mhz, that is almost 2 flop per clock
Unbelievable :D , is the post available on the old forum somewhere?

Edit: I've found my old thread HERE, 340mips (whetstone) at 168MHz, but it looks like we had some problems with it at that time.. :?

Does the new STM's core support FPU?

Re: Bluepill F4 board, anyone still working on it?

Posted: Tue Dec 24, 2019 3:32 pm
by fpiSTM
Pito wrote: Tue Dec 24, 2019 2:48 pm Does the new STM's core support FPU?
Yes

Re: Bluepill F4 board, anyone still working on it?

Posted: Tue Dec 24, 2019 5:00 pm
by Pito
FYI - I've built the whetstone for single precision without and with FPU (F407ZET, 168MHz), Rogers core:

Code: Select all

Loops: 1000 Iterations: 10 Duration: 42000 millisec.   0 clocks
C Converted Single Precision Whetstones: 23.81 MIPS

Loops: 1000 Iterations: 10 Duration: 10103 millisec.   0 clocks
C Converted Single Precision Whetstones: 98.98 MIPS

Re: Bluepill F4 board, anyone still working on it?

Posted: Wed Dec 25, 2019 4:27 am
by ag123
Pito wrote: Tue Dec 24, 2019 2:48 pm
on stm32f407, pito achieved 500 mflops on an mcu overclocked to 250mhz, that is almost 2 flop per clock
Unbelievable :D , is the post available on the old forum somewhere?
here it is, 466.95 Mflops just a whisker off 500 Mflops but close enough
https://web.archive.org/web/20190316173 ... 160#p26942

:D

Re: Bluepill F4 board, anyone still working on it?

Posted: Wed Dec 25, 2019 9:40 am
by Pito
I think 8-10x speed up with math functions is max you may get off those stm32 single precision FPUs. In the whetstone those 4x I got yesterday is something pretty realistic, imho. A few weeks back I did a test with the pic32MZEF (double precision FPU, different benchmark) and I got 13x.
So 466mflops at 240MHz indicates an issue somewhere.

Re: Bluepill F4 board, anyone still working on it?

Posted: Wed Dec 25, 2019 11:42 am
by ag123
i'd think the compiler could have 'optimised' away codes and took short cuts rather than doing the math, nevertheless, f407 is a dual fpu core + that ART accelerator. So if one measures only the SP floating point speeds, it is still plausible that 500 mflops in this 'simple' way may be true. But i'd guess in real apps, it'd be hard to get anywhere close as there would be overheads in other codes :lol:

Re: Bluepill F4 board, anyone still working on it?

Posted: Wed Dec 25, 2019 1:57 pm
by Pito
With default optimization (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 11689 millisec.   0 clocks
C Converted Single Precision Whetstones: 85.55 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000

With -O1 LTO (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 5290 millisec.   0 clocks
C Converted Single Precision Whetstones: 189.04 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000

With -O3 (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 5055 millisec.   0 clocks
C Converted Single Precision Whetstones: 197.82 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000
With -O2 LTO (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 4924 millisec.   0 clocks
C Converted Single Precision Whetstones: 203.09 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000

Re: Bluepill F4 board, anyone still working on it?

Posted: Thu Dec 26, 2019 5:50 am
by ag123
it seemed like gcc 'fixed' the wild optimization problems, i'm still waiting for my f401 to arrive & i'd probably try running it on it. :lol:
btw 203 Mflops is still pretty decent considering this is an mcu ! :D

Re: Bluepill F4 board, anyone still working on it?

Posted: Thu Dec 26, 2019 4:35 pm
by Pito
Well, the double precision would be something more interesting to have handy.
Single p. is rather limited in use, imho.

Re: Bluepill F4 board, anyone still working on it?

Posted: Fri Dec 27, 2019 6:32 am
by ag123
actually my guess is part of the reason for that fp 'speeds' is in part it is 32bits fp, it is probably significantly simpler to implement vs 64 bits (or 80 bits) fpu.
a lot of those (earlier and even current) nvidia, amd gpus basically accelerates 32bits fp, 64 bits is almost always much slower (i've seen ratios like 1:60 etc) i.e. double precision fp64 is 60 times slower than fp32.)
my guess is fp32 is primarily targeting the digital filters (e.g. dsp) and 'basic' fp calcs. my guess is certain things like inverse kinematics may possibly be adequate to do in fp32. fp32 is more of a problem in cases of iterative search (errors accumulate in every iteration) or where values degenerate (underflow) then suddenly you have 1 / 0 -> infinity