Page 7 of 9
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 4:46 pm
by ag123
1 min, -O3 is there, fpu flags missing, updating them
using some big flags
Code: Select all
-mcpu=cortex-m4 -march=armv7e-m+fp
Code: Select all
-mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
platforms.txt
Code: Select all
# this can be overriden in boards.txt
build.mcu=cortex-m4
build.cpu_flags=-mcpu=cortex-m4 -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
didn't work try
Code: Select all
build.common_flags=-mcpu=cortex-m4 -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant -mthumb -D__STM32F4__ -DSTM32F4
now ld returns error
i know why, i'd dig that out from my makefile
need to add
Code: Select all
-L $(ARM_NONE_EABI_PATH)/arm-none-eabi/lib/thumb/v7e-m+fp
when linking
try
Code: Select all
compiler.ldflags={build.flags.ldspecs} -L{runtime.tools.arm-none-eabi-gcc.path}/arm-none-eabi/lib/thumb/v7e-m+fp
nope, need to find another place to patch
try
Code: Select all
compiler.c.elf.extra_flags="-L{build.variant.path}/ld" "-Wl,--wrap=atexit,--wrap=__cxa_atexit,--wrap=exit" -L{runtime.tools.arm-none-eabi-gcc.path}/arm-none-eabi/lib/thumb/v7e-m+fp
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 5:14 pm
by Bingo600
Yddrfff
Try (in boards.txt)
blackpill_f401.build.extra_flags=-DLED_BUILTIN=PC13 -DCRYSTAL_FREQ=25 -DNO_CCMRAM -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
blackpill_f401.menu.opt.o3std.build.flags.optimize=-O3
blackpill_f401.menu.opt.o3std.build.flags.ldspecs=-mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
Instead of that AWFULL : -L $(ARM_NONE_EABI_PATH)/arm-none-eabi/lib/thumb/v7e-m+fp
Ought to work
I have always meant it is BAD karma to point directly to mcu-specific libs (as in thumb / fp etc ...) , will get you in trouble later on
/Bingo
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 5:20 pm
by ag123
give up for now, that special vfp library probably has something to do with the mflops
but my makefile works
viewtopic.php?p=327#p327
trying that 1st
Code: Select all
Beginning Whetstone benchmark at 84 MHz ... -OS
Loops:10000, Iterations:1, Duration:15041.21 millisec
C Converted Single Precision Whetstones:66.48 Mflops
Code: Select all
Beginning Whetstone benchmark at 84 MHz ...
Loops:10000, Iterations:1, Duration:8287.97 millisec
C Converted Single Precision Whetstones:120.66 Mflops
now this looks normal

i'd need to figure out where to patch platforms.txt and boards.txt
in the mean time if you are keen on trying out the makefile, the necessary edits are:
- ARM_NONE_EABI_PATH - this needs to point to the installed location of your arm-none-eabi gcc/g++ toolchain
(if you have the official core installed there is actually one in '$home/dotarduino15' : .arduino15/packages/STM32/tools/xpack-arm-none-eabi-gcc/9.2.1-1.1)
- move the arduino sources into a sub folder called src. rename the .ino into .cpp
- symlink ./STM32F4 to the STM32F4 folder in Arduino_STM32. or you can copy that Arduino_STM32/STM32F4 into ./STM32F4
on running make, the output goes into the directory ./build
the bin file is in there
note that make clean deletes ./build directory
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 6:02 pm
by ag123
now i know why that vfp is so special
http://infocenter.arm.com/help/topic/co ... dejjh.html
1.5.9. Vector Floating-Point (VFP)
The VFP coprocessor supports floating point arithmetic operations and is a functional block within the ARM1176JZF-S processor. The VFP coprocessor is mapped as coprocessor numbers 10 and 11. Software can determine whether the VFP is present by the use of the Coprocessor Access Control Register. See c1, Coprocessor Access Control Register for more details.
if you look at the enablefpu() codes commented
Code: Select all
// Enable the FPU (Cortex-M4 - STM32F4xx and higher)
// http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/BEHBJHIG.html
//void enablefpu() {
// __asm volatile
// (
// " ldr.w r0, =0xE000ED88 \n" /* The FPU enable bits are in the CPACR. */
// " ldr r1, [r0] \n" /* read CAPCR */
// " orr r1, r1, #( 0xf << 20 )\n" /* Set bits 20-23 to enable CP10 and CP11 coprocessors */
// " str r1, [r0] \n" /* Write back the modified value to the CPACR */
// " dsb \n" /* wait for store to complete */
// " isb" /* reset pipeline now the FPU is enabled */
// );
//}
CP10, CP11, exactly that, this enablefpu() is there in both the official core and libmaple core, hence it isn't necessary to put that in the sketch
i've always been telling pito that there are 2 fpu in there, pito disagreed.
it isn't 2 fpu, it is 1 fpu with 2 vector lanes, hence we see mflops higher than the cpu mhz
it is a little incredible our little cortex-m4 processors has such features
but as it seemed, it is there indeed
arm gcc compiler is 'no ordinary' compiler, -O3 turns ordinary looking c/c++ routines into partly vector codes, i'd guess this is reflected in the whetstone mflops. and perhaps, it may not be just gcc, but that special vfp library
and as pito noted
viewtopic.php?p=558#p558
our little m4 has mflops that look like they are as fast or faster than a 2 GHz P4

Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 7:56 pm
by Bingo600
Well if we had an ARM11 i'd say you're right , unfortunately this is a Cortex-M4
What is happening in the MCU i can't say, i mean if it's really calculating those flops , or if it's doing something entirely else ie. NOP's
/Bingo
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 8:18 pm
by Bingo600
Aha ... You had something there : -march=armv7e-m+fp
As the old 7-2017q4 did NOT like -march=armv7e-m+fp
I just switched to the latest arm-gcc : 9-2019-q4
Put it here : $HOME/.arduino15/packages/arduino/tools/arm-none-eabi-gcc
And did these in boards.txt
blackpill_f401.build.extra_flags=-DLED_BUILTIN=PC13 -DCRYSTAL_FREQ=25 -DNO_CCMRAM -mthumb -mcpu=cortex-m4 -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
And (for the -03) (linking)
blackpill_f401.menu.opt.o3std.build.flags.ldspecs=-march=armv7e-m+fp -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
Now it is "flying" - I enabled print
Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00
Loops:10000, Iterations:1, Duration:8596.40 millisec
C Converted Single Precision Whetstones:116.33 Mflops
/Bingo
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 8:27 pm
by ag123
time to mess with the m, n, p, q pll scalers
at 96mhz, the normal hz for F411
you'd get 134.89 Mflops (hopefully more)
viewtopic.php?p=539#p539
the small catch is
https://github.com/stevstrong/Arduino_S ... f401.h#L39
^you would need to change that or systick and the timers would be incorrect
but
https://github.com/stevstrong/Arduino_S ... cF4.c#L476
Code: Select all
void rcc_clk_init(void)
{
SystemCoreClock = CYCLES_PER_MICROSECOND * 1000000;
#if CYCLES_PER_MICROSECOND == 168
SetupClock168MHz();
#elif CYCLES_PER_MICROSECOND == 120
SetupClock120MHz();
#elif CYCLES_PER_MICROSECOND == 96
SetupClock96MHz();
#elif CYCLES_PER_MICROSECOND == 84
SetupClock84MHz();
#elif CYCLES_PER_MICROSECOND == 72
SetupClock72MHz();
#else
#error Wrong CYCLES_PER_MICROSECOND!
#endif
}
so you'd probably want to do a 'bypass' so that it keeps calling SetupClock84MHz();
even if you change CYCLES_PER_MICROSECOND
e.g. comment others and leave SetupClock84MHz();
then you can go into SetupClock84MHz() and tweak the m, n, p, q prescalers to get 96 Mhz
that python script would be useful there
viewtopic.php?p=393#p393
keep a copy of your 'custom' libmaple core, lest you 'forget' when you do an 'update'

Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 8:44 pm
by Bingo600
I just enabled ART (I think it might be default)
Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00
Loops:10000, Iterations:1, Duration:7974.93 millisec
C Converted Single Precision Whetstones:125.39 Mflops
ART disabled
Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00
Loops:10000, Iterations:1, Duration:10925.91 millisec
C Converted Single Precision Whetstones:91.53 Mflops
ART enabled
Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00
Loops:10000, Iterations:1, Duration:7976.07 millisec
C Converted Single Precision Whetstones:125.37 Mflops
I had to add : "-I{build.core.path}/libmaple/" to platform.txt - to compile/resolve flashF4.h
compiler.libs.c.flags="-I{build.system.path}/libmaple" "-I{build.core.path}/libmaple/" "-I{build.core.path}/libmaple/usbF4"
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 8:55 pm
by ag123
like pito mentioned
viewtopic.php?p=558#p558
it would superficially look like we beat the 2 Ghz P4 at only a mere 84 mhz
but of course we aren't comparing the same thing, to get more mflops than mhz, there is vector floating point
literally 2 vector lanes in 1 fpu.
probably back then gcc -O3 don't know how to auto vectorize whetstone for the P4 using SSE
we have that high end arm 11 fpu in our little stm32 F4
http://infocenter.arm.com/help/topic/co ... dejjh.html
https://community.arm.com/cfs-file/__ke ... D00_M7.pdf
Re: Bluepill F4 board, anyone still working on it?
Posted: Sun Jan 19, 2020 9:04 pm
by Bingo600
The whitepaper is interesting , but i Can't find the : literally 2 vector lanes in 1 fpu.
It says : All of the instructions are single-cycle on Cortex-M4 (except hardware divide)
All "dual" i can see is fir the M7 , and we don't have an ARM11 - Powerusage would "SkyRocket"
But THANK YOU for spending so much time with me , on this FPU Quest
Really appreciated
Edit: I'm enclined to agree with pito ... Our numbers are/could be - fishy ... If we beat a 2GHz P4
/Bingo