- use "pwmTim->setCaptureCompare(..." as first line of code in interrupt callback. No matter how long you calculations are, interrupt latency is usually same, so you'll have steady timing. Plus, once value of counter-compare is written, it will apply that value in next timer overflow. From my experiments, there is, at worst, around 3.5uS to enter and same time to exit interrupt callback. More code = more latency.
- know your numbers:
Counter-compare register is unsigned 16bit number, between 0 and overflow (also 16bit unsigned).
For 20uS, overflow is 72*20=1400 cpu (timer) ticks. (if in TICK_FORMAT)
- use integers, not floats for samples, for faster bit-shifting. Multiplication/division is much slower, it will end as unsigned 16bit integer number for counter-compare value anyway (more like 11bit, in your case)
- use 1024 samples, simple
will do, no needs for "if/else". It will be even faster if array is in RAM, not in flash memory.
- in main loop, set builtin led to on, in interrupt callback, first line of code is to set led to off. Use your eyes to determine brightness of led. Once is mostly dimmed, you had reached limit (plus, you'll gain 0.7uS ( digitalWrite(LED_BUILTIN,LOW) when you delete that line of code, once testing are finished)
- before timers for interrupts are set, call (in setup) you interrupt callback 1000 times and measure how much it need to execute it (use arduino's micros() )
- imho, two functions in one interrupt callback is faster then two interrupt callbacks with one function each.
- or calculate lot of samples in main loop, and use only head/trail in interrupt
- take all this advices with great reserve, i've being wrong before

Good luck