dhrystone and whetstone benchmarks

ag123 · Post by **ag123** » Fri Dec 20, 2019 3:43 pm

this is an old favorite topic in the old forum
uploaded again, seasons greetings

fpiSTM · Post by **fpiSTM** » Fri Dec 20, 2019 3:55 pm

Thanks @ag123

For STM32 Core, they are available in the STM32Examples library:

Dhrystone

Whetstone
- DoublePrecision
- SinglePrecision

ag123 · Post by **ag123** » Fri Dec 20, 2019 4:04 pm

+1 Thanks !

zoomx · Post by **zoomx** » Thu Jan 23, 2020 11:36 am

Pito · Post by **Pito** » Fri Jan 24, 2020 8:42 pm

viewtopic.php?p=861#p861

Pito · Post by **Pito** » Sat Jan 25, 2020 2:29 pm

Here is the "original" Single Precision Whetstone benchmark modified for STM32DUINO.
It builds here with the STM core for F407.
It is my understanding the FPU is enabled in the STM core by default.
Not tested on real hw yet.
Try with 401/411 and do report bugs, plz.
It should be build with PRINTF enabled.

fpiSTM · Post by **fpiSTM** » Sat Jan 25, 2020 2:57 pm

@Pito
I saw several post around the whetstone.
Did you try the one in the STM32duino Examples library?

ag123 · Post by **ag123** » Sat Jan 25, 2020 3:10 pm

well, i think those sp flops around 60 Mflops to 150 Mflops for the F401 84mhz, F411 96 mhz is after all real.
for one thing it seem rather close to the arm 11 vfp fpu less that 'vector' floating point
http://infocenter.arm.com/help/topic/co ... DEJJH.html
nevertheless, fp instructions probably execute at 1 flops per cycle and that there are possibly several alu e.g. separate for multiply, divide and add, plus some kind of 'speculative' (out of order) execution. this is the only way to explain the above 1 flops per hz performance on the stm32f4x cpus
the optimization using vfp libraries may also have used things like fma (floating multiply and add) instructions for the whetstone benchmarks, that does both multiply and add in a single instruction which would make matrix - vector calcs run like they do 2 flops per clock
so after all we do have pretty fast single precision floating points on our little f4 chips
these are probably more advanced than the p4 technology at that time, after all desktop intel chips these days runs more than a single flops per flop and intel does that much more extreme at 64 bits (in fact 80 bits)

https://en.wikipedia.org/wiki/Extended_precision

Pito · Post by **Pito** » Sat Jan 25, 2020 3:13 pm

fpiSTM wrote: Sat Jan 25, 2020 2:57 pm @Pito
I saw several post around the whetstone.
Did you try the one in the STM32duino Examples library?

All the Whets benchmarks people and me messed with were done with that one from your Examples.
That one provided suspicious results.
I would suggest you to remove it when confirmed my results are more "real"

Try with the one I uploaded above. It is version 0.1.

Pito · Post by **Pito** » Sat Jan 25, 2020 3:20 pm

@ag123: be so kind and try to run the one above on your F401.
We will get at least some results we may compare to the real world.

You should get following table (with F401 numbers):

: Whetstone result.PNG (32.91 KiB) Viewed 64199 times

Arduino for STM32

dhrystone and whetstone benchmarks

dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks

Re: dhrystone and whetstone benchmarks