dhrystone and whetstone benchmarks

Post here first, or if you can't find a relevant section!
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: dhrystone and whetstone benchmarks

Post by Bingo600 »

fpiSTM wrote: Sun Jan 26, 2020 7:14 pm This is provided by ARM:
https://github.com/ARM-software/CMSIS_5 ... SP/Lib/GCC
Damm ...
".a" files ==> Compiled libraries

No source ..

Well i'd prob not understand the "magic" anyway

/Bingo
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: dhrystone and whetstone benchmarks

Post by Bingo600 »

Mispost
Last edited by Bingo600 on Sun Jan 26, 2020 7:58 pm, edited 1 time in total.
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: dhrystone and whetstone benchmarks

Post by Bingo600 »

What does this flag do ??

ST Core does not specify :

Code: Select all

-fsingle-precision-constant

Code: Select all

GenF4.build.flags.fp=-mfpu=fpv4-sp-d16 -mfloat-abi=hard
F411 Built as cortex-m7

No -fsingle-precision-constant
-O2

Fast N1 , slow N5 , slow MWIPS

Code: Select all

##########################################
Single Precision C Whetstone Benchmark
Calibrate
       0.15 Seconds          1   Passes (x 100)
       0.77 Seconds          5   Passes (x 100)
       3.86 Seconds         25   Passes (x 100)

Use 64  passes (x 100)

          Single Precision C/C++ Whetstone Benchmark
Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475013732910156        48.000              0.026
N2 floating point     -1.12274742126464844        33.340              0.258
N3 if then else        1.00000000000000000                  94.628    0.070
N4 fixed point        12.00000000000000000                 160.001    0.126
N5 sin,cos etc.        0.49909299612045288                   0.924    5.764
N6 floating point      0.99999982118606567        31.994              1.079
N7 assignments         3.00000000000000000                  41.210    0.287
N8 exp,sqrt etc.       0.75110614299774170                   1.052    2.264

MWIPS                                             64.819              9.874
-fsingle-precision-constant
-O2

Slow(er) N1 , fast N5 , fast MWIPS

Code: Select all

##########################################
Single Precision C Whetstone Benchmark
Calibrate
       0.10 Seconds          1   Passes (x 100)
       0.50 Seconds          5   Passes (x 100)
       2.51 Seconds         25   Passes (x 100)

Use 99  passes (x 100)

          Single Precision C/C++ Whetstone Benchmark
Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475013732910156        45.150              0.042
N2 floating point     -1.12274742126464844        33.347              0.399
N3 if then else        1.00000000000000000                  95.761    0.107
N4 fixed point        12.00000000000000000                 143.710    0.217
N5 sin,cos etc.        0.49909299612045288                   2.268    3.631
N6 floating point      0.99999982118606567        33.862              1.577
N7 assignments         3.00000000000000000                  41.113    0.445
N8 exp,sqrt etc.       0.75110614299774170                   1.043    3.532

MWIPS                                             99.496              9.950
F411 built normal s a m4

-O2
-fsingle-precision-constant

Code: Select all

##########################################
Single Precision C Whetstone Benchmark
Calibrate
       0.10 Seconds          1   Passes (x 100)
       0.50 Seconds          5   Passes (x 100)
       2.48 Seconds         25   Passes (x 100)

Use 100  passes (x 100)

          Single Precision C/C++ Whetstone Benchmark
Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475013732910156        46.489              0.041
N2 floating point     -1.12274742126464844        33.350              0.403
N3 if then else        1.00000000000000000                  95.833    0.108
N4 fixed point        12.00000000000000000                 159.898    0.197
N5 sin,cos etc.        0.49909299612045288                   2.261    3.680
N6 floating point      0.99999982118606567        35.984              1.499
N7 assignments         3.00000000000000000                  41.158    0.449
N8 exp,sqrt etc.       0.75110614299774170                   1.051    3.538

MWIPS                                            100.854              9.915
ST Core - Default
-O2
No -fsingle-precision-constant

Code: Select all

##########################################
Single Precision C Whetstone Benchmark
Calibrate
       0.15 Seconds          1   Passes (x 100)
       0.77 Seconds          5   Passes (x 100)
       3.85 Seconds         25   Passes (x 100)

Use 64  passes (x 100)

          Single Precision C/C++ Whetstone Benchmark
Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12475013732910156        46.545              0.026
N2 floating point     -1.12274742126464844        33.340              0.258
N3 if then else        1.00000000000000000                  96.000    0.069
N4 fixed point        12.00000000000000000                 180.001    0.112
N5 sin,cos etc.        0.49909299612045288                   0.915    5.817
N6 floating point      0.99999982118606567        33.878              1.019
N7 assignments         3.00000000000000000                  41.210    0.287
N8 exp,sqrt etc.       0.75110614299774170                   1.051    2.265

MWIPS                                             64.952              9.853
arm-gcc docs says
https://gcc.gnu.org/onlinedocs/gcc-4.7. ... tions.html

Code: Select all

-fsingle-precision-constant
    Treat floating-point constants as single precision instead of implicitly converting them to double-precision constants. 
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: dhrystone and whetstone benchmarks

Post by Bingo600 »

Well this day went totally cortex + FPU :)

But at lest i got BLACKPILL_F411CE implemented in the ST Core @96MHz + ART Enabled
BPF411CE.png
BPF411CE.png (63.62 KiB) Viewed 5529 times
I ran my first program on my "Ali Black-F407"
I got my DISCO-F407 dusted off

And i learned a lot about the boards.txt & variants directories

I'm off ... ZZZzzzzzz

Goodbye & Thanx for "all the fish" (help/comments)

/Bingo
User avatar
Pito
Posts: 94
Joined: Tue Dec 24, 2019 1:53 pm

Re: dhrystone and whetstone benchmarks

Post by Pito »

Bingo600 wrote: Sun Jan 26, 2020 7:28 pm
fpiSTM wrote: Sun Jan 26, 2020 7:14 pm This is provided by ARM:
https://github.com/ARM-software/CMSIS_5 ... SP/Lib/GCC
Damm ...
".a" files ==> Compiled libraries

No source ..

Well i'd prob not understand the "magic" anyway

/Bingo
https://github.com/ARM-software/CMSIS_5 ... DSP/Source
Pukao Hats Cleaning Services Ltd.
dannyf
Posts: 447
Joined: Sat Jul 04, 2020 7:46 pm

Re: dhrystone and whetstone benchmarks

Post by dannyf »

my attempt, at benchmarking a variety of chips, and outside of the typical dhrystone / whetstone routines: https://dannyelectronics.wordpress.com/ ... -exercise/

Interesting to see how the chips fall. the numbers are measured in cycle counts.
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: dhrystone and whetstone benchmarks

Post by ag123 »

new benchmark STM32H743VIT6 - 480 Mhz
viewtopic.php?p=8886#p8886

Code: Select all

Beginning Whetstone benchmark at 480 MHz ...

Loops:10000, Iterations:1, Duration:1203.82 millisec
C Converted Single Precision Whetstones:830.69 Mflops
Beginning Whetstone benchmark at 480 MHz ...

Loops:10000, Iterations:1, Duration:1203.48 millisec
C Converted Single Precision Whetstones:830.93 Mflops
-O2 optimised, cache turned on
that 'doubled' fpu speeds is likely real as the FPU in all the series from F4 to F7, H7 is done using the VFP processor. i.e. vector floating point
so in a single instruction it can process 2 lanes of data - vectorized.
spiceagent11
Posts: 1
Joined: Mon Nov 08, 2021 6:11 pm

Re: dhrystone and whetstone benchmarks

Post by spiceagent11 »

the optimization using vfp libraries may also have used things like fma (floating multiply and add) instructions for the whetstone benchmarks, that does both multiply and add in a single instruction which would make matrix - vector calcs run like they do 2 flops per clock
so after all we do have pretty fast single precision floating pointsVidMate | Bluestacks 3 | hdmoviearea
Last edited by spiceagent11 on Wed Mar 20, 2024 9:38 pm, edited 1 time in total.
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: dhrystone and whetstone benchmarks

Post by ag123 »

Yup, i recently ventured reading some stuff about VFP, this is a different 'generation' of FPU, things like fused multiply and add are common, hence, the whetstone benchmark could have been boosted by that alone. And with vectorized 2 lanes, when the compiler compiles the codes that way, every floating-point instruction basically runs in parallel in 1 clock cycle. That makes for the 'apparent' high speeds, the speeds are likely real, but accordingly vector floating point suffers from non-compliance to ieee754 in its rounding (flush to zero) and saturating arithmetic.
richieadam
Posts: 1
Joined: Tue Dec 07, 2021 8:17 pm

Re: dhrystone and whetstone benchmarks

Post by richieadam »

When looking at the historical whetstone result data - the big CPUs get higher MWIPS, but much lower MFLOPs at the same clock.

Also I do not understand you got 100 MWIPS @96MHz (411) and @168MHz (407) we get 98.

There must be a subtle bug in the code somewhere.

The historical results show nice linear dependency MWIPS on clock.

I do assume we should see something like 150 MWIPS @168MHz (407).

I would suggest small changes in the code, wait..

In the original code it was double

Code: Select all

double  theseSecs = 0.0;
double  startSecs = 0.0;
double  secs = 0.0;
Also you may use micros() to get better resolution

Code: Select all

void getSecs()
{
  theseSecs =  micros() / 1000000.0;
  return;
}
I doubt it helps, however, you may try..

PS: is the ART enabled in the STM core ????
Post Reply

Return to “General discussion”