1. the issue here isn't nearly as much HAL. but more in the arduino implementation. I have written quite extensively about this, but in general, the arduino implementation is 20x to 50x slower than the direct register access. 2. your particular implementation isn't optimal ...
//unlock the domain //on some chips, it requires a set of magic numbers #define bkpUnlock() do {PWR->CR1 |= PWR_CR1_DBP;} while (!(PWR->CR1 & PWR_CR1_DBP)) //1->enable write assess, 0->disable write assess #define bkpLock() do {PWR->CR1 &=~PWR_CR1_DBP ...