AT32F403A anyone?

Anything not related to STM32
ozcar
Posts: 143
Joined: Wed Apr 29, 2020 9:07 pm
Answers: 5

Re: AT32F403A anyone?

Post by ozcar »

webjorn wrote: Sat Sep 17, 2022 2:26 pm ...
------- C code -----------

for(k=0;k<maxdata;k++) { // fill array with alternating 0 and 1
if(( k & 1) == 0)
datarray[k] = 0x0;
else
datarray[k] = 0xffff;
}

t1 = micros(); // this does not serve any purpose now, it just generates a bl micros, which is easily found in the dump file
snapshot = *DWT_CYCCNT;
datptr = &datarray[0];
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
(*(uint16_t *) 0x40010c0c ) = *datptr++;
Serial.print("CYCCNT : ");
Serial.println(*DWT_CYCCNT - snapshot);
-----------------------------

THIS IS dump of the elf of the project taken with

.arduino15/packages/WeActStudio/tools/xpack-arm-none-eabi-gcc/11.2.1-1.2/bin/arm-none-eabi-objdump /tmp/arduino_build_865338/Blink-403a-out-1.ino.elf -d -S -l > dmp14.txt

/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:158
snapshot = *DWT_CYCCNT;
80004b2: 4d2d ldr r5, [pc, #180] ; (8000568 <_Z4loopv+0x34c>)
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:157
t1 = micros();
80004b4: f000 f9de bl 8000874 <micros>
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:169
(*(uint16_t *) 0x40010c0c ) = *datptr++;
80004b8: 4b2e ldr r3, [pc, #184] ; (8000574 <_Z4loopv+0x358>)
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:158
snapshot = *DWT_CYCCNT;
80004ba: 686f ldr r7, [r5, #4]
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:169
(*(uint16_t *) 0x40010c0c ) = *datptr++;
80004bc: f8c6 3fa0 str.w r3, [r6, #4000] ; 0xfa0
80004c0: 4b1c ldr r3, [pc, #112] ; (8000534 <_Z4loopv+0x318>)
80004c2: 8a72 ldrh r2, [r6, #18]
80004c4: 819a strh r2, [r3, #12]
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:170
Serial.print("CYCCNT : ");
80004c6: 492c ldr r1, [pc, #176] ; (8000578 <_Z4loopv+0x35c>)
80004c8: 4620 mov r0, r4
80004ca: f000 fda0 bl 800100e <_ZN5Print5printEPKc>
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:171
Serial.println(*DWT_CYCCNT - snapshot);
80004ce: 6869 ldr r1, [r5, #4]
80004d0: 220a movs r2, #10
80004d2: 1bc9 subs r1, r1, r7
80004d4: 4620 mov r0, r4
80004d6: f000 fdfe bl 80010d6 <_ZN5Print7printlnEmi>
/home/webjorn/Arduino/Blink-403a-out-1/Blink-403a-out-1.ino:147
digitalWrite(PC13, HIGH); // turn the LED on (HIGH is the voltage level)
80004da: e7d1 b.n 8000480 <_Z4loopv+0x264>

--------- end of dump

...
I'm not sure exactly how you got it to generate the assembly code you show there.

I just tried your code repeating this

Code: Select all

(*(uint16_t *) 0x40010c0c ) = *datptr++;
10 times on a F103. I made the adjustment suggested by dannyf, so as to not include the number of cycles taken by the first Serial.print.

It tells me that CYCCNT is 4. No, not 4 per move, 4 in total for the 10 moves! Now, that sounds pretty impressive, until I looked at the generated code, and it turns out that all 10 moves were optimised out. This is another hazard of rolling you own definition for GPIOB->ODR and not declaring it as volatile. Also, GPIOB->ODR is uint32_t, not uint16_t, which will influence the assembly instructions generated.

Unless you turn optimisation off completely, the compiler may re-order your statements, and might even decide some are not needed at all. All this can make it hard to follow the generated code. But then, if you do turn optimisation off, it can generate some very slow code.
webjorn
Posts: 43
Joined: Sat Jul 09, 2022 8:49 pm

Re: AT32F403A anyone?

Post by webjorn »

Granted::

it doesn't seem to be what your code is doing.

Correct: No the printing of a fixed string screws that up of course, this would have been better.

scratch = *DWT_CYCCNT - snapshot;
Serial.print("Whatever");
Serial.println(scratch);

would be better. However, I am alternating with programming for best programmed io speed , and trying to verify on o-scope

using DMA would probably hit another high, however, I can I not emit the entire buffer in an unrolled loop and get the timing I
figured was possible, such as one word every 4 cycles. (equalling 60 Mhz /60 Msamples/second)

The binary code as a result of the compile confuses me, does the ARM have builtin "bulk" operations...?

"I am still confused, but now on a higher level"

Gullik
webjorn
Posts: 43
Joined: Sat Jul 09, 2022 8:49 pm

Re: AT32F403A anyone?

Post by webjorn »

Tnx ozcar,

Yes, I have thougt about optimizations, unfortunately this is not a choice with the AT32F403A, at least not from the menu.

Also, the attribute volatile will / might change the compiler code generator. I will check this out.

Right now I am getting confusing results.
The loop
for(k=0;k<maxsize;k++) {
(*(something) = datarray[k];
}
seems to work and results in 7 clock cycles to execute

the sequence
datptr = &datarray[0];
*something = *datptr++;
*something = *datptr++;
*something = *datptr++;
*something = *datptr++;
*something = *datptr++;
*something = *datptr++;

should emit the contents of the array to what "something" points to
Q: what does *something = *datptr++" expand to? How many cycles ? (4??)

Now, arm assembler is not really my thing, I grew up with the pdp11.

Gullik
dannyf
Posts: 447
Joined: Sat Jul 04, 2020 7:46 pm

Re: AT32F403A anyone?

Post by dannyf »

you probably want to step back and ask yourself what you are trying to do and then lay out a few options. asking people to help you on a particular approach will hit its limitation soon.

in your case, based on how little I know, seems like using DMA would be good. SPI seems to work as well. if you want to just flip a pin fast, try bit-banding. using a port odr or worse yet digitalWrite/Read is the last resort.

stepping back first will help you get to your destination sooner.
ozcar
Posts: 143
Joined: Wed Apr 29, 2020 9:07 pm
Answers: 5

Re: AT32F403A anyone?

Post by ozcar »

webjorn wrote: Sat Sep 17, 2022 11:09 pm Also, the attribute volatile will / might change the compiler code generator. I will check this out.
...
Now, arm assembler is not really my thing, I grew up with the pdp11.
Well, I can't see anything at all in the fragment of assembly code that you showed that would be moving data to the GPIO ODR, which makes me think that maybe your code did get optimised out (as it was for me). However, you have made it hard for me to be sure, because the piece of code you showed was too small (needs to extend back to the start of the loop() routine, and go down far enough to see locations that get loaded into registers at various points).

If the compiler has thrown all that out, I can't see how you would see any action via DSO though, regardless of what values you put in the data array.

Arm assembler is not so much my thing either, I grew up on System 360 and its descendents.
webjorn
Posts: 43
Joined: Sat Jul 09, 2022 8:49 pm

Re: AT32F403A anyone?

Post by webjorn »

Now with redeclared GPIOB definition

volatile unsigned int *GPIOBPORT = (volatile unsigned int *)0x40010C0C;

I have the unrolled loop:

datptr = &datarray[0];
snapshot = *DWT_CYCCNT;
*GPIOBPORT = *datptr++ & 0xffff;
*GPIOBPORT = *datptr++ & 0xffff;
*GPIOBPORT = *datptr++ & 0xffff;

total of 64 stores....

*GPIOBPORT = *datptr++ & 0xffff;
*GPIOBPORT = *datptr++ & 0xffff;
snap2 = *DWT_CYCCNT;
Serial.print("CYCCNT : ");
Serial.println(snap2 - snapshot);

The compiler generates the same code for the first 40 or so stores...

80004f2: 8fa1 ldrh r1, [r4, #60] ; 0x3c // up til here it uses ldrh to get array data
80004f4: f8c3 1c0c str.w r1, [r3, #3084] ; 0xc0c
*GPIOBPORT = *datptr++ & 0xffff;
80004f8: 8fe1 ldrh r1, [r4, #62] ; 0x3e
80004fa: f8c3 1c0c str.w r1, [r3, #3084] ; 0xc0c
*GPIOBPORT = *datptr++ & 0xffff;
80004fe: f8b4 1040 ldrh.w r1, [r4, #64] ; 0x40 // but here it decides to use ldrh.w instead...
8000502: f8c3 1c0c str.w r1, [r3, #3084] ; 0xc0c
*GPIOBPORT = *datptr++ & 0xffff;
8000506: f8b4 1042 ldrh.w r1, [r4, #66] ; 0x42
800050a: f8c3 1c0c str.w r1, [r3, #3084] ; 0xc0c
*GPIOBPORT = *datptr++ & 0xffff;
800050e: f8b4 1044 ldrh.w r1, [r4, #68] ; 0x44
8000512: f8c3 1c0c str.w r1, [r3, #3084] ; 0xc0c
*GPIOBPORT = *datptr++ & 0xffff;
8000516: f8b4 1046 ldrh.w r1, [r4, #70] ; 0x46

Total time from CYCCNT is 265 which is roughly 4 cycles per store,
It can be seen that the first sequence generates 3 words per store, but later on 4 words. Is 0x3e the largest offset that can be integrated
in the instruction?

A bit strange ??

Gullik
ozcar
Posts: 143
Joined: Wed Apr 29, 2020 9:07 pm
Answers: 5

Re: AT32F403A anyone?

Post by ozcar »

As I said, I'm no Arm assembly expert, but I'm always willing to learn...

Given you can see in the assembly listing that the ldrh instructions are only 2 bytes, and it must have space to identify the instruction (op code of some sort), and two registers, there can't be very many bits to use for the immediate offset. Indeed, from the Arm reference doc, this is evidently ldrh "encoding T1" which has 5 bits for the offset. 5 bits allows maximum of 0x1f, but it has a low order 0 bit appended, giving maximum offset of 0x3e (so, in effect the offset is encoded in halfwords rather than bytes).

When it changes over to using the ldrh.w you can see those instruction are 4 bytes long, so no big surprise that it then has more bits available for the offset. Again looking at the Arm doc, and ldrh.w "encoding T2" has 12 bits for the offset. However, it does not apply the "append a 0 bit" trick to this offset, so maximum offset will be 0xfff (byte addressable, rather than halfword addressable).

I don't see why you need "& 0xffff" on those statements, but fortunately the compiler seems to be smart enough to achieve that just by the choice of instructions to use.
webjorn
Posts: 43
Joined: Sat Jul 09, 2022 8:49 pm

Re: AT32F403A anyone?

Post by webjorn »

"I don't see why you need "& 0xffff" on those statements, but fortunately the compiler seems to be smart enough to achieve that just by the choice of instructions to use."

Well, i got a problem trying to write a 16-bit (uint16_t) into a 32 bit register (uint32_t). The & 0xffff is interpreted as 0000ffff, a 32bit int,
into which the 16 bit value from the array is inserted. That allowed compilation. Without it I got invalid type conversion.

If one looks at the actual code, the store is a str.w r1 , i.e. store word ( w = 16bit??) which is explicitly forbidden in the RM, which says that
the ODR register should always be treated as 32 bits ( although only 16 bits are used).

Then, as can be seen from the assembly code, the ACTUAL code is just a load / store, no or'ing involved...

So, the compiler and the RM has different opinions :-)

I think I have reached the end of this experiment. To improve on programmed IO speed any more, I would have to resort to
unrolled loop of assembly instructions, Looking at the current o-scope trace, a distinkt "kink" is seen as the loop delay
changes, (due to, as you point out) the cycle goes from 3 words to 4....

Still it is somewhat impressive, that I can do programmed IO at > 30 Mhz from a standard C program....

This Chinese STM32 "clone" is impressive I think and it was $5 a board....
Arduino support is coming along....but *very* slowly, what I miss most are the libraries like those that come with a "real" STM32...

Gullik
webjorn
Posts: 43
Joined: Sat Jul 09, 2022 8:49 pm

Re: AT32F403A anyone?

Post by webjorn »

And, END of investigation:

A straight sequence of assmbly iread/write statements produces a jitterfree square wave on the io pins.

asm volatile (
"gpiobport = 0x40010c0c \n\t"
"ldr r1, =datarray \n\t"
"ldr r2, =gpiobport \n\t"
"ldrh r0, [r2],#0 \n\t"
"str.w r0, [r1],#2 \n\t"
"ldrh r0, [r2],#0 \n\t"
"str.w r0, [r1],#2 \n\t"
....
"ldrh r0, [r2],#0 \n\t"
"str.w r0, [r1],#2 \n\t"
:::"r0","r1","r2");

results in each sample taking 16.667 nS = 60 Msamples per second. Due to various compiler kinks doing it in C
does not result in predictable results (although more efficient or faster)

Gullik
thunderwiring
Posts: 1
Joined: Sun Oct 16, 2022 1:27 pm

Re: AT32F403A anyone?

Post by thunderwiring »

Hi, i have the AT32F403a board blackpill from weact as well, i was able to upload code to it. Also i wrote a makefile to compile the c/c++ code to binary and then to hex and finally upload to the board. LMK if you need help. i'm using windows 10
Post Reply

Return to “Off topic”