DMA for SPI

Related to the the forum.
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

I'm stumbling into various build issues, my tool chain is too old and I'd need to reinstall my Linux OS to a new version.

But there are pretty difficult hacks that need to be done, e.g. that even if the HAL headers has same symbols and definitions, they are in different sets of directories for each series.

the extreme of which is that one may need to generate a #include file that #include a specific set of #include .h files each time for each different series, soc and board. That's how STM32Cube IDE works, but STM32Cube IDE generates codes as well.

fpiSTM (et.al) has done a huge job with Arduino_Core_STM32 to integrate such a huge portfolio of HAL setups.

I'm attempting to build my test codes for different series/board using CMake, various errors each round missing HAL symbols etc. That is even true for the sketch, variant and core. I think it is related to my old tool chain rather than the codes.

I managed a build for stm32f4xx using a makefile, and the build works as well in Arduino IDE 2.x
I'd leave things for a while, hopefully not abandoned.

--
off-topic, a motivation for such attempts is that more recently the 'better' boards are hitting the streets
e.g. WeAct currently has quite good offering of boards sporting new STM32 series many are the good ones like stm32h7xx, g4xx, h5xx,
wb55 (bluetooth) and Adafruit has a stm32f405 feather, Arduino has a portena H7, micropython has some good boards h722 as well.

many 3d printer boards are also running on stm32

stm32f103c8 'bluepill' seemed to be a 'legacy' that wears on even though the more recent new series, socs, boards have well 'surpassed' the stm32f103c8 'tech', new ones many have more flash, more sram and some much faster and more peripherals
fpiSTM
Posts: 1919
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 107
Location: Le Mans
Contact:

Re: DMA for SPI

Post by fpiSTM »

Hi @ag123
will try when got some times. Do not hesitate to ping me if no news :mrgreen:
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

ok, I managed some 'minor' progress after a little 'breakthgough' messing with CMake

https://github.com/ag88/stm32duino_spi_ ... in/src/SPI

Added F1xx, only compiled against stm32f103c8 and only in CMake, with an old outdated toolchain (old xpack-arm-non-eabi) never tested the real thing.
And not compiled in Arduino IDE.

So this is really experimental, untested codes.

Apparently, stm32f103c6-rb only has 2 SPI and only RC - VE, VG etc has 3 (or more?) SPIs.
I only made classes for the 1st 3 SPI, and the usual if-defs kicked in. I have all those

Code: Select all

#if defined(stm32f4xx)
do this
#elif defined(stm32f1xx)
do that
#else
call that plain old stack
#endif
:P

and also 'disabled' codes for SPI3 using if-defs.

as I code, I feel less sure of my codes, stm32f4xx is a large series, so are other series f3, g4, h7, h5, etc etc.
and that my so called 'f1xx' is only compiled against f103c8, there are many more in f1xx that are not examined.
it is uncertain if the codes are after all sufficiently generic and/or if they'd even work as expected.

oh and the build is practically 'fat', the bin file is some 40+ K built against Adafruit ILI9341 lib (with some 'optimizations') with the graphics test sketch.
https://github.com/ag88/Adafruit_ILI9341_SPI_stm32duino
it is similar for both F4 and F1. For 'squeezy' the f103c8 with 64k flash, that leave little headroom to do more than a graphicstest.
It is also uncertain if that 20k sram could be 'too squeezy' to really run things.
Note this is never tried out, just built against it so far.

it is also practically tedious to even cover a small fraction of socs, the classes at least has syntax differences e.g.
SPI_DMAF4xx, SPI_DMAF1xx etc, those are done using a text search and replace manually.
the dma codes are different between F4 and F1
F4 has both streams and channels, all the streams are parallel, so it is much higher performance
https://github.com/ag88/stm32duino_spi_ ... A_F4XX.cpp
F1 only has channels
https://github.com/ag88/stm32duino_spi_ ... A_F1XX.cpp
and that these are only examining a few permutations of soc in the whole series
I'm not sure if within the series there could be other differences, I've not examined them.

pausing for a while again
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

added stm32g4
https://github.com/ag88/stm32duino_spi_ ... DMA_G4XX.h
https://github.com/ag88/stm32duino_spi_ ... A_G4XX.cpp

and this is only based on stm32g431cbux generic variant

stm32g4xx DMA peripheral is again different from say F4xx, it has channels but no streams.
But what is interesting is that unlike the prior F1 and F4 series, the DMA block is not directly bound to the peripherals itself.
instead, it goes to the peripheral data register directly and that the peripheral binding is done by means of a DMAMUX

this result in an additional field init.Request at initialization, that specify the peripheral to bind to.

There are 2 DMA blocks on stm32g431cbux. And that more interestingly, apparently, the binding of each DMA block to the peripherals (e.g. SPI1, SPI2, SPI3, etc) is not fixed unlike the F1, F4 series. Hence, I bound SPI1 to DMA1, SPI2, SPI3 to DMA2.
This could in a practical sense means that 2 channels are parallel at any one time.

Again these are not actually tested, just compiled and only in cmake. I'm not too sure if it'd literally work as expected if done in Arduino IDE itself.
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

refactor SPIDMA.cpp use direct registers for transfer(uint8_t data)
work around hardware differences for FIFO which uses TXP instead of TXE flag, and RXP instead of RXNE flag
https://github.com/ag88/stm32duino_spi_ ... A.cpp#L201
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

added stm32h7
https://github.com/ag88/stm32duino_spi_ ... DMA_H7XX.h
https://github.com/ag88/stm32duino_spi_ ... A_H7XX.cpp

again, this is based on using one variant as example to make the codes, untested codes, just compiled.

stm32h7xx has both variations in the SPI hardware and DMA hardware.

For SPI hardware, it has a FIFO, and the status flags to check the transmit buffer changes from TXE (transmit buf empty) to TXP, while receive changes from RXNE (receive buf not empty) to RXP. Initially, I hit a bummer there, that is addressed in the prior comment again using if-defs.

I think SPI with FIFO hardware are also there in the stm32Lxx series, I've not studied that yet.

DMA hardware on H7 is elaborate, it has both a high performance DMA with streams as well as a 'basic' DMA.
That 'basic' DMA is similar to that on g4xx, but that both high perf DMA and 'basic' DMA uses the DMAMUX as like g4xx.
Just that in the H7 'high perf' DMA, it consists 8 parallel streams, rather than 1 channel per unit.

The assignments are made similar to g4xx spi1 uses DMA1, spi2, spi3, uses DMA2.
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

added stm32h5
https://github.com/ag88/stm32duino_spi_ ... DMA_H5XX.h
https://github.com/ag88/stm32duino_spi_ ... A_H5XX.cpp

again, this is based on using one variant as example to make the codes, untested codes, just compiled.

stm32h5 spi peripheral is like those in stm32h7, they have a FIFO, I'd guess if used in a correct context it could prevent packet lost and/or higher performance. Those features are not really utilized.

A thing is the stm32 H5 DMA peripheral which is called GPDMA ('general purpose' DMA?) is very different (far more elaborate) from those of prior like stm32f4, g4, h7 etc. It has 8 parallel channel in each GPDMA block, but that there is something I couldn't figure out how to do so correctly even after reading the ref manual and various docs several times. Which is that GPDMA uses Linked List Items (LLI). I think what it means is that every of those 8 channels can be used for multiple uses e.g. the same channel can be used for say like SPI and I2C, and the hardware knows how to swap the configuration if the Linked List Items are configured correctly. I couldn't figure out how to configure that and only reviewed CubeMX examples for direct configurations, which is quite similar to the prior series F4, G4, H7 etc. i.e. each channel is bound for a single use instead of using LLI (Linked List Items). SPI1 is bound to GPDMA1 (ch0 rx, ch1 tx), and SPI2, SPI3 bound to GPDMA2 (ch0 spi2_rx, ch1 spi2_tx, ch2 spi3_rx, ch3 spi3_tx).

I'm not really sure if it'd work but that it is just compiled and it seemed there are no syntax errors etc.

Interestingly, stm32H5 has 2D DMA, but that it didn't seem quite like Chrom-ART DMA 2D in stm32H7 and various high end socs.
Chrom-ART DMA 2D apparently has in addition a blender in it, so it can to alpha channel blending stuff in hardware. Pretty much a graphics accelerator.
But that still the 2D DMA may have applications even without a blender.
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

I did a refactor for SPIDMA to make 2 different implementations of single buffer transfer

In the Arduino SPI API, buffer transfer is provided for only a single buffer
https://docs.arduino.cc/language-refere ... /transfer/
SPI transfer is based on a simultaneous send and receive: the received data is returned in receivedVal (or receivedVal16). In case of buffer transfers the received data is stored in the buffer in-place (the old data is replaced with the data received).

Code: Select all

SPI.transfer(buffer, size)
I'm however, not sure if DMA transfers can use the same buffer in-place for both transmit and receive. It is likely feasible for master transfers. But unlikely if an SPI slave is implemented, as the clocks is driven and received from NSS rather than by the current MCU.

I've made 2 implementations for this:
SingleBufTransferCopy
This implementation use a temporary buffer on the stack to receive data using DMA, then copies that into the single buffer
https://github.com/ag88/stm32duino_spi_ ... sferCopy.h
https://github.com/ag88/stm32duino_spi_ ... py.cpp#L20
and
SingleBufTransferInplace
This implementation use the same buffer for DMA transmit and receive. It does not check if data may be overwritten
https://github.com/ag88/stm32duino_spi_ ... rInplace.h
https://github.com/ag88/stm32duino_spi_ ... ce.cpp#L19

It's use is by using the appropriate include in SPIDMA.h
https://github.com/ag88/stm32duino_spi_ ... IDMA.h#L21
And that the global instance of the SingleBufferTransfer implementation is passed to SPIDMA during initialization / construction.
https://github.com/ag88/stm32duino_spi_ ... IDMA.h#L27

Currently, the SingleBufTransferCopy implementation is used as default.
This is probably 'safer', but that it allocates a buffer on the stack (using alloca()) to receive data over DMA.
Then copies that back to the single buffer before returning.
There is likely a performance hit and more memory use due to this.

If anyone knows that DMA single buffer transfers in place can be used i.e. use same buffer for transmit and receive (with HAL), do drop a comment / note.
That would be *a lot* faster and use less memory than copying it again before the call return.
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

added stm32f3
https://github.com/ag88/stm32duino_spi_ ... DMA_F3XX.h
https://github.com/ag88/stm32duino_spi_ ... A_F3XX.cpp

again, this is based on using one variant as example to make the codes, untested codes, just compiled.

STM32F3 DMA is closer to F1(03) than it is to F4xx.

on F303(B/C/D/E) are there 2 DMA controllers, F303(6/8) has only 1 DMA controller.
SPI1, SPI2 are bound to DMA1, SPI3 to DMA2.

this is based on F303CC as example, I'm not sure if anything would break across the soc in the series.
ag123
Posts: 1881
Joined: Thu Dec 19, 2019 5:30 am
Answers: 30

Re: DMA for SPI

Post by ag123 »

https://github.com/ag88/stm32duino_spi_ ... SPIClass.h
https://github.com/ag88/stm32duino_spi_ ... IClass.cpp
https://github.com/ag88/stm32duino_spi_ ... SPIBasic.h
https://github.com/ag88/stm32duino_spi_ ... IBasic.cpp

refactor SPIClass
- re-organised so that the API parts remain in SPIClass
and various methods e.g. init(), transfer() etc are implemmented in
derived class, some of the methods are made virtual, abstract
- SPI without DMA is implemented in SPIBasic class
it implements the init(), initSPI(), transfer(), getClkFreq() etc methods
- various parts originally implemented in spi_com.c are implemented
in SPIBasic class instead, instance variables are used instead
of external structures, e.g.
SPI_HandleTypeDef spihandle; //for HAL
SPI_TypeDef *spi_reg; //SPI register base
PinName pin_miso;
PinName pin_mosi;
PinName pin_sclk;
PinName pin_ssel;
are defined as instance variables in SPIClass itself.
They are made protected so that derived classes can access them

the build completes, but this is still untested codes.
Post Reply

Return to “Ideas & suggestions”