Speed Up your IO !!

Post here first, or if you can't find a relevant section!
User avatar
ManX84
Posts: 20
Joined: Tue Oct 17, 2023 3:30 pm

Speed Up your IO !!

Post by ManX84 »

Hello all,

just a small example to illustrate how to speed up your IO.
STM32 Arduino core use HAL interface to write or read IO. HAL is know to be very slow.. but how mutch slow?

Here a test, made on STM32F108C:

Code: Select all

	//Here a classical arduino write -> use STM32 HAL interface -> F = 396 KHz
	do
	{
		digitalWrite(PC13, LOW);
		digitalWrite(PC13, HIGH);
	} while (1); 

	// Here a direct access to STM32 GPIO -> F = 3.22 MHz !!!!
	do
	{
		GPIOC->ODR |= (1 << 13);
		GPIOC->ODR &= ~(1 << 13);
	}while (1);
Look at that !! 396 Khz with standards Arduino procedure (using STM32 HAL interface), it is a pity ! :shock:
But 3.22 MHz (yes!) with direct GPIO access ! :mrgreen:

So common guys, speed up your IO ! 8-)
;-)
Exposing your opinion is good, exposing your code is better! :mrgreen:
User avatar
ManX84
Posts: 20
Joined: Tue Oct 17, 2023 3:30 pm

Re: Speed Up your IO !!

Post by ManX84 »

Need Faster ?

Code: Select all

	// Here use a digitalWriteFast !
	do
	{
		digitalWriteFast(PC_13, 1);
		digitalWriteFast(PC_13, 0);
	} while (1);
F = 7 MHz! :mrgreen:
Exposing your opinion is good, exposing your code is better! :mrgreen:
GonzoG
Posts: 403
Joined: Wed Jan 15, 2020 11:30 am
Answers: 26
Location: Prudnik, Poland

Re: Speed Up your IO !!

Post by GonzoG »

ManX84 wrote: Mon Oct 23, 2023 4:04 pm Need Faster ?

Code: Select all

	// Here use a digitalWriteFast !
	do
	{
		digitalWriteFast(PC_13, 1);
		digitalWriteFast(PC_13, 0);
	} while (1);
F = 7 MHz! :mrgreen:
I got 19MHz wit digitalReadFast and 33MHz with digitalWriteFast, but with typing 1000 lines of code.
While loop needs few cycles.
User avatar
ManX84
Posts: 20
Joined: Tue Oct 17, 2023 3:30 pm

Re: Speed Up your IO !!

Post by ManX84 »

@GonzoG PLease give us your code ! (I do not pretend to get the fastest .. just starting to play)

Next .. What is the STM32Duino system clock setting ?? well I have a stm32F103C8 and you could found the setting here :

"C:\...\AppData\Local\arduino15\packages\STMicroelectronics\hardware\stm32\2.6.0\variants\STM32F1xx\F103C8T_F103CB(T-U)\generic_clock.c"

and if you configure STMCubeMx with these values, you found :
setting clock stm32.jpg
setting clock stm32.jpg (79.65 KiB) Viewed 2707 times
So only 48MHz ! why not the "stock max" 72 MHz ??

I generate the clock setting code to get full clock frequency 72MHz (my board have a 8MHz crystal) :

Code: Select all


void SystemClock_Config(void)
{
	RCC_OscInitTypeDef RCC_OscInitStruct = { 0 };
	RCC_ClkInitTypeDef RCC_ClkInitStruct = { 0 };

	/** Initializes the RCC Oscillators according to the specified parameters
	* in the RCC_OscInitTypeDef structure.
	*/
	RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSE;
	RCC_OscInitStruct.HSEState = RCC_HSE_ON;
	RCC_OscInitStruct.HSEPredivValue = RCC_HSE_PREDIV_DIV1;
	RCC_OscInitStruct.HSIState = RCC_HSI_ON;
	RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
	RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE;
	RCC_OscInitStruct.PLL.PLLMUL = RCC_PLL_MUL15;
	if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
	{
		Error_Handler();
	}

	/** Initializes the CPU, AHB and APB buses clocks
	*/
	RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK | RCC_CLOCKTYPE_SYSCLK
		| RCC_CLOCKTYPE_PCLK1 | RCC_CLOCKTYPE_PCLK2;
	RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
	RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
	RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV2;
	RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV2;

	if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_2) != HAL_OK)
	{
		Error_Handler();
	}
}

/**
  * @brief GPIO Initialization Function
  * @param None
  * @retval None
  */
static void MX_GPIO_Init(void)
{
	/* USER CODE BEGIN MX_GPIO_Init_1 */
	/* USER CODE END MX_GPIO_Init_1 */

	  /* GPIO Ports Clock Enable */
	__HAL_RCC_GPIOD_CLK_ENABLE();

	/* USER CODE BEGIN MX_GPIO_Init_2 */
	/* USER CODE END MX_GPIO_Init_2 */
}

void setup()
 {

	//clock frequency settings (72MHz)
	SystemClock_Config();
	MX_GPIO_Init();
}
	
Add this to your code (only for STM32F103C8)
Have Fun!
Exposing your opinion is good, exposing your code is better! :mrgreen:
User avatar
ManX84
Posts: 20
Joined: Tue Oct 17, 2023 3:30 pm

Re: Speed Up your IO !!

Post by ManX84 »

I play again a little and "Overclock" to 128MHz ... working at my room temperature ( I live in SIberia ! :mrgreen: )
Exposing your opinion is good, exposing your code is better! :mrgreen:
dannyf
Posts: 446
Joined: Sat Jul 04, 2020 7:46 pm

Re: Speed Up your IO !!

Post by dannyf »

but how mutch slow?
1. the issue here isn't nearly as much HAL. but more in the arduino implementation. I have written quite extensively about this, but in general, the arduino implementation is 20x to 50x slower than the direct register access.
2. your particular implementation isn't optimal: you incur significant loop overhead.

It is best to measure in terms of cpu ticks: that way, you can compare efficiency at different frequencies.
User avatar
ManX84
Posts: 20
Joined: Tue Oct 17, 2023 3:30 pm

Re: Speed Up your IO !!

Post by ManX84 »

@dannyf
Surely you are right but the objective was not to make an optimum code.

Your position is the one of "advanced" or "specialist" but think about all beginners or hobbyist?
these small post just illustrate how original STM32 arduino gpio writing is slow and how we can easily get better performance.
So usually people come here with a problem, like me most of the time. This time to give back to community I came with "easy" solutions to get more performance from "stock" Arduino STM32 core.

At last, there are not to much information about arduino on STM32 in the web. I bring my little stone to the building and I hope that it will be useful for all beginners and hobbyists. I have no other pretensions.

But, you are the second to claim that my code is not optimum, this is clear, why not “repay” the community for what it gave you by doing us with your codes and techniques ?

Exposing your opinion is good, exposing your code is better!
Exposing your opinion is good, exposing your code is better! :mrgreen:
ag123
Posts: 1653
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: Speed Up your IO !!

Post by ag123 »

you can try playing with bit banding
https://developer.arm.com/documentation ... it-banding
but that it works mainly in Cortex M3 e.g. stm32f103 and M4 stm32f4xx
cortex M0 and M7 e.g. stm32h7xx don't have it I think.
bit banding is more convenient as you can simply write to a memory location, and it toggles a bit / pin
and that it may be literally faster than setting registers
dannyf
Posts: 446
Joined: Sat Jul 04, 2020 7:46 pm

Re: Speed Up your IO !!

Post by dannyf »

I remember using bit banding on luminary LM3S and TI LM4F chips.

Based on my benchmarking then they aren't faster than BRR and BSRR.

The better approach I think is the implementation of SET, CLR and INV registers on pic32.

Bit banding is optional for CM0 I think. I have never seen a cM0 chip with bit banding. Will try later.
dannyf
Posts: 446
Joined: Sat Jul 04, 2020 7:46 pm

Re: Speed Up your IO !!

Post by dannyf »

well, some numbers: Keil MDK, -Odefault, STM32F103

first, using BSRR/BRR: 6.7K ticks/1K run

Code: Select all

//fast routines through BRR/BSRR registers
#define FIO_SET(port, pins)					port->BSRR = (pins)
#define FIO_CLR(port, pins)					port->BRR = (pins)
#define FIO_FLP(port, pins)					(IO_GET(port, pins)?FIO_CLR(port, pins):FIO_SET(port, pins))	//IO_FLP(port, pins)
#define FIO_GET(port, pins)					IO_GET(port, pins)

//execution:
		//for (tmp=0; tmp<1000/5; tmp++) {FIO_SET(GPIOB, 1<<7); FIO_CLR(GPIOB, 1<<7); FIO_SET(GPIOB, 1<<7); FIO_CLR(GPIOB, 1<<7); FIO_SET(GPIOB, 1<<7); FIO_CLR(GPIOB, 1<<7); FIO_SET(GPIOB, 1<<7); FIO_CLR(GPIOB, 1<<7); FIO_SET(GPIOB, 1<<7); FIO_CLR(GPIOB, 1<<7); }	//6.7K/1k
now, bit banding: 15.7K ticks /1K run

Code: Select all

//bit-banding on peripherals
#define PERI_BASE			0x40000000
#define PERI_BB_BASE		(PERI_BASE + 0x02000000)
#define BIO2BB(addr, bit)	(*(volatile uint32_t *) (PERI_BB_BASE | (((addr) - PERI_BASE) << 5) | ((bit) << 2)))
#define BIO_GET(port, bit)	(BIO2BB((uint32_t) &(port->IDR), bit))
#define BIO_SET(port, bit)	(BIO2BB((uint32_t) &(port->ODR), bit) = 1)
#define BIO_CLR(port, bit)	(BIO2BB((uint32_t) &(port->ODR), bit) = 0)
#define BIO_FLP(port, bit)	(BIO_GET(port, bit)?BIO_CLR(port, bit):BIO_SET(port, bit))

//execution:
		//for (tmp=0; tmp<1000/5; tmp++) {BIO_SET(GPIOB, 1<<7); BIO_CLR(GPIOB, 1<<7); BIO_SET(GPIOB, 1<<7); BIO_CLR(GPIOB, 1<<7); BIO_SET(GPIOB, 1<<7); BIO_CLR(GPIOB, 1<<7); BIO_SET(GPIOB, 1<<7); BIO_CLR(GPIOB, 1<<7); BIO_SET(GPIOB, 1<<7); BIO_CLR(GPIOB, 1<<7); }	//15.7K/1k
for comparison, the regular ops via ODR: 12.1K ticks /1K run

Code: Select all

//port/gpio oriented macros for PIC
#define IO_SET(port, pins)					port |= (pins)				//set bits on port
#define IO_CLR(port, pins)					port &=~(pins)				//clear bits on port
#define IO_FLP(port, pins)					port ^= (pins)				//flip bits on port
#define IO_GET(port, pins)					((port) & (pins))			//return bits on port
//gpio port based
#define GIO_SET(port, pins)					IO_SET(port->ODR, (pins))				//set bits on port
#define GIO_CLR(port, pins)					IO_CLR(port->ODR, (pins))				//clear bits on port
#define GIO_FLP(port, pins)					IO_FLP(port->ODR, (pins))				//flip bits on port
#define GIO_GET(port, pins)					IO_GET(port->IDR, (pins))			//return bits on port
/

		//for (tmp=0; tmp<1000/5; tmp++) {GIO_FLP(GPIOB, 1<<7);GIO_FLP(GPIOB, 1<<7);GIO_FLP(GPIOB, 1<<7);GIO_FLP(GPIOB, 1<<7);GIO_FLP(GPIOB, 1<<7);}					//flip led, 12.1k/1000 ticks
So bit banding is the slowest of the three, and BSRR/BRR the fastest.
Post Reply

Return to “General discussion”