for '3rd party' boot loaders i.e. not there in a bare stm32
some of them are like those discussed e.g. in this sub
viewforum.php?f=58
https://github.com/Serasidis/STM32_HID_Bootloader
Some of these are personal developments in which the authors contributed them as open sourced projects.
I did not use them, so I'm not able to comment more about them.
I'm quite happy with the trouble of all that 'boot0' manual toggles as the board I used
https://stm32-base.org/boards/STM32F401 ... -Pill-V1.2
the boot0 is simply a toggle button which I can easily access, and typically it (the board) is just on my desk.
So I'd just toggle it manually and later flash away.
I used dfu-util as I'm in Linux
http://dfu-util.sourceforge.net/
But I'd think
stm32 cube programmer works just well and it is the means directly supported in the 'official' STM core. I used dfu-util as the other is bulkier kind of, while dfu-util is a little 'utility' in a single small file < 1 MB, so I'd place it 'anywhere convenient'. This may not be true in Windows. stm32 cube programmer has in addition features to update firmware in ST-link dongles. Oh and st-link (v2) dongles are pretty much a 'must have', to do anything 'deeper' with stm32.
https://www.adafruit.com/product/2548
https://octopart.com/st-link%2Fv2-stmic ... 57793?r=sp
it can be used to flash firmware too and this doesn't need that boot0 button. But it'd still need wires to be connected.
I'm not using platformIO, Hence, I can't comment about it. But do note that the 'official' repository is here
https://github.com/stm32duino/Arduino_Core_STM32
Other platforms, when they use / derive from this, may have adapted it to add other things that may be a feature in those cores e.g. in PlatformIO. But they are not necessarily the same as the 'official' source. Which is mainly tested on the Arduino IDE 1.8.x
I'm also looking out for that 'soft boot0', I think the stm32f4xx and above probably have that feature. This is literally the "short cut" that instead of writing a boot loader, I'm thinking of using my sketch to 'set soft boot0' then run HAL_NVIC_SystemReset() to go to the built-in on chip boot loader.
It do saves a lot of hassle.
But in the mean time, I'm using that internal/on chip (usb DFU) boot loader which needs toggling of the boot0 pins.
it works adequately well.
In terms of writing to the internal flash, look in the ref manual for the chapters on flash memory.
I actually used it successfully as storage for one of my own apps, a logger kind of.
Once you figure it out, I'd think writing a 'boot loader' is simply finding a way to receive the 'data' and writing it to the "right" places.
Putting the 'boot loader' in 'high' memory is a typical strategy as the normal start of flash is 0x8000000.
So to be a 'boot loader' it needs to sit outside the block where it is erasing and writing over it, hence the 'high memory' design.
I think if you bother to dig around github etc, there are probably some implementations. Though it may not be for stm32duino.
But all these are still a 'hassle', so try to find the 'short cut' e.g. that 'soft boot0', dig thru the ref manual, If you found it to update comments here as well

That would save all the trouble, as the USB DFU boot loader is basically built-in on chip.
But I noted that quite commonly, after flashing, I'd need to do a reset, i.e. NRST.
I'd think it is partly as I used a public domain dfu-util utility which may not have codes that use specific stm32 features.