Page 1 of 10

dhrystone and whetstone benchmarks

Posted: Fri Dec 20, 2019 3:43 pm
by ag123
this is an old favorite topic in the old forum
uploaded again, seasons greetings :D

Re: dhrystone and whetstone benchmarks

Posted: Fri Dec 20, 2019 3:55 pm
by fpiSTM
Thanks @ag123

For STM32 Core, they are available in the STM32Examples library:

Re: dhrystone and whetstone benchmarks

Posted: Fri Dec 20, 2019 4:04 pm
by ag123
+1 Thanks ! :D

Re: dhrystone and whetstone benchmarks

Posted: Thu Jan 23, 2020 11:36 am
by zoomx
+2!

Re: dhrystone and whetstone benchmarks

Posted: Fri Jan 24, 2020 8:42 pm
by Pito

Re: dhrystone and whetstone benchmarks

Posted: Sat Jan 25, 2020 2:29 pm
by Pito
Here is the "original" Single Precision Whetstone benchmark modified for STM32DUINO.
It builds here with the STM core for F407.
It is my understanding the FPU is enabled in the STM core by default.
Not tested on real hw yet.
Try with 401/411 and do report bugs, plz.
It should be build with PRINTF enabled.

Re: dhrystone and whetstone benchmarks

Posted: Sat Jan 25, 2020 2:57 pm
by fpiSTM
@Pito
I saw several post around the whetstone.
Did you try the one in the STM32duino Examples library?

Re: dhrystone and whetstone benchmarks

Posted: Sat Jan 25, 2020 3:10 pm
by ag123
well, i think those sp flops around 60 Mflops to 150 Mflops for the F401 84mhz, F411 96 mhz is after all real.
for one thing it seem rather close to the arm 11 vfp fpu less that 'vector' floating point
http://infocenter.arm.com/help/topic/co ... DEJJH.html
nevertheless, fp instructions probably execute at 1 flops per cycle and that there are possibly several alu e.g. separate for multiply, divide and add, plus some kind of 'speculative' (out of order) execution. this is the only way to explain the above 1 flops per hz performance on the stm32f4x cpus
the optimization using vfp libraries may also have used things like fma (floating multiply and add) instructions for the whetstone benchmarks, that does both multiply and add in a single instruction which would make matrix - vector calcs run like they do 2 flops per clock
so after all we do have pretty fast single precision floating points on our little f4 chips
these are probably more advanced than the p4 technology at that time, after all desktop intel chips these days runs more than a single flops per flop and intel does that much more extreme at 64 bits (in fact 80 bits) :)
https://en.wikipedia.org/wiki/Extended_precision

Re: dhrystone and whetstone benchmarks

Posted: Sat Jan 25, 2020 3:13 pm
by Pito
fpiSTM wrote: Sat Jan 25, 2020 2:57 pm @Pito
I saw several post around the whetstone.
Did you try the one in the STM32duino Examples library?
All the Whets benchmarks people and me messed with were done with that one from your Examples.
That one provided suspicious results.
I would suggest you to remove it when confirmed my results are more "real" :)

Try with the one I uploaded above. It is version 0.1.

Re: dhrystone and whetstone benchmarks

Posted: Sat Jan 25, 2020 3:20 pm
by Pito
@ag123: be so kind and try to run the one above on your F401.
We will get at least some results we may compare to the real world.

You should get following table (with F401 numbers):
Whetstone result.PNG
Whetstone result.PNG (32.91 KiB) Viewed 17382 times