SoftPWM via DMA and no CPU cycles

Post your cool example code here.
universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

SoftPWM via DMA and no CPU cycles

Post by universam10 » Fri Jul 07, 2017 11:31 am

Hi, this is more of a brain challenge I wanted to share, if it would be possible to do Software PWM with DMA thus requiring no CPU load, than a robust implementation so far.
Here is a proof of concept that it actually works, and the cool thing is indeed it takes (almost) no CPU cycles when the duty cycles dont get changed. :P The interrupt takes some cycles honestly, but almost none if no updates happen. So it doesnt matter if I do pwm on one or 16 pins.

I'm using double buffering of 32bit for all 16pins of a port, so the buffer required is 2*resolution*4 bytes.

I've tested on a F1 all 16 pins of PortC successfully up to a pulse of 4us, which is a division of period frequency and resolution. So for instance for 8bit resolution 980 Hz would work.

I'm not sure if it makes too much sense or not, eager to hear what you think. It could be extended to a real library of course and support other Ports. Also the question is where the limits are regarding the frequency.

BTW., I wonder is it Software PWM any more since its done via hardware DMA ... :mrgreen:

Code: Select all

#include <Arduino.h>
#include <libmaple/dma.h>
#include <dma_private.h>

#define RESOLUTION 255 // PWM resolution
#define FREQUENCY 500  // PWM frequency

#if 1000000 / RESOLUTION / FREQUENCY < 4
#error did not work for me
#endif

class DMASoftPWM
{
  public:
    DMASoftPWM();
    void begin(gpio_dev *port);
    void setPinMode(uint8_t pin, bool enable);
    void writePWM(uint8_t pin, uint16_t val);
    uint32_t buffer[RESOLUTION * 2];

  private:
    static DMASoftPWM *anchor;
    static void marshall() { anchor->DMAEvent(); }
    inline void fillBuffer(uint16_t ptr);
    uint16_t pinVal[16];
    uint16_t pinmask;
    uint8_t refresh;
    void DMAEvent();
    dma_tube_config tube_config;
};
DMASoftPWM::DMASoftPWM()
{
    anchor = this;
}

void DMASoftPWM::begin(gpio_dev *port)
{
    refresh = 2;

    dma_init(DMA1);
    tube_config.tube_src = buffer;
    tube_config.tube_src_size = DMA_SIZE_32BITS;
    tube_config.tube_dst = (uint32_t *)&GPIOC->regs->BSRR; // Load pointer to porta clear/set
    tube_config.tube_dst_size = DMA_SIZE_32BITS;
    tube_config.tube_nr_xfers = RESOLUTION * 2;
    tube_config.tube_flags = DMA_CFG_SRC_INC | DMA_CFG_CIRC | DMA_CFG_CMPLT_IE | DMA_CFG_HALF_CMPLT_IE; // Source pointer increment,circular mode
    tube_config.target_data = 0;
    tube_config.tube_req_src = DMA_REQ_SRC_TIM2_CH3; // DMA request source.
    dma_set_priority(DMA1, DMA_CH1, DMA_PRIORITY_VERY_HIGH);
    dma_tube_cfg(DMA1, DMA_CH1, &tube_config); // Attach the tube to channel 1 (timer2 ch3)
    dma_attach_interrupt(DMA1, DMA_CH1, DMASoftPWM::marshall);
    dma_enable(DMA1, DMA_CH1);

    //TIMER setup
    Timer2.pause();
    Timer2.setPeriod(10000000UL / FREQUENCY / RESOLUTION);
    Timer2.setChannel3Mode(TIMER_OUTPUT_COMPARE);
    Timer2.setCompare(TIMER_CH3, 1);
    Timer2.refresh();
    TIMER2_BASE->DIER = TIMER_DIER_CC3DE;
    Timer2.resume();
}

void DMASoftPWM::fillBuffer(uint16_t ptr)
{
    for (uint16_t step = 1; step <= RESOLUTION; step++)
    {
        buffer[ptr] = pinmask << 16;

        for (uint8_t p = 0; p < 16; p++)
        {
            if (pinmask & (BIT(p)) && pinVal[p] >= step)
                buffer[ptr] |= BIT(p);
        }
        ptr++;
    }
    refresh--;
}
void DMASoftPWM::DMAEvent()
{
    dma_irq_cause event = dma_get_irq_cause(DMA1, DMA_CH1);

    if (refresh == 0) // no update so just keep the mem
        return;

    switch (event)
    {
    case DMA_TRANSFER_COMPLETE: // now setting the upper half
        fillBuffer(RESOLUTION);
        break;
    case DMA_TRANSFER_HALF_COMPLETE: //now setting the lower half
        fillBuffer((uint16_t)0);
        break;
    case DMA_TRANSFER_ERROR:
        ASSERT(0);
        break;
    case DMA_TRANSFER_DME_ERROR:
        ASSERT(0);
        break;
    case DMA_TRANSFER_FIFO_ERROR:
        ASSERT(0);
        break;
    }
}

void DMASoftPWM::setPinMode(uint8_t pin, bool enable)
{
    pinMode(pin, OUTPUT);

    if (enable)
        pinmask |= digitalPinToBitMask(pin);
    else
        pinmask &= ~digitalPinToBitMask(pin);
}

void DMASoftPWM::writePWM(uint8_t pin, uint16_t val)
{
    pinVal[pin] = val;
    refresh = 2;
}
DMASoftPWM *DMASoftPWM::anchor = NULL;
DMASoftPWM softPWMPortC;

void setup()
{
    Serial.begin(115200);
    Serial.println("starting usb serial");

    softPWMPortC.begin(GPIOC);
    softPWMPortC.setPinMode(PC13, true);
}

void loop()
{
    if (Serial.available())
    {
        int pin = Serial.parseInt();
        int val = Serial.parseInt();
        while (Serial.available())
            Serial.read();
        softPWMPortC.writePWM(pin, val);
        Serial.print(pin);
        Serial.print(':');
        Serial.println(val);

#ifdef DEBUGBUFFER
        delay(100);
        for (int u = 0; u < RESOLUTION * 2; u++)
            Serial.println(softPWMPortC.buffer[u], BIN);
#endif
    }

    static uint32_t sweep;
    static uint16_t t = 0;
    if (millis() - sweep > 1000 / RESOLUTION)
    {
        sweep = millis();
        t = ++t % RESOLUTION;
        softPWMPortC.writePWM(13, t);
    }
}
edit: 4us not ns :oops:
Last edited by universam10 on Fri Jul 07, 2017 11:59 am, edited 3 times in total.

User avatar
Pito
Posts: 1522
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: SoftPWM via DMA and no CPU cycles

Post by Pito » Fri Jul 07, 2017 11:37 am

I've tested on a F1 all 16 pins of PortC successfully up to a pulse of 4ns, which is a division of period frequency and resolution. So for instance for 8bit resolution 3900 Hz would work.
There is no way to get 4ns pulse with F1..
Pukao Hats Cleaning Services Ltd.

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Fri Jul 07, 2017 11:46 am

sorry, typo I meant 4us, and I meant 980Hz :roll:

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Fri Jul 07, 2017 12:29 pm

Looks like Timer2.setPeriod() does a bit weird stuff if it comes below 4.

With

Code: Select all

   Timer2.setPrescaleFactor(F_CPU / RESOLUTION / FREQUENCY);
    Timer2.setOverflow(1);
the pulse works down to 1.7us on the F1 which is 2300Hz at 8bit resolution.

victor_pv
Posts: 1643
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Fri Jul 07, 2017 2:13 pm

Why do you use 2 buffers with the capacity of resolution?
Shouldn't 1 be enough since you are using circular mode?
My guess is that's so you can update the PWM duty cycle in one while the DMA is sending the other, but you could also update the duty cycle in the one being sent, and avoid getting artifacts if you are updating the values that the DMA is sending, but if you fill it from the top down for updates I think you shouldn't get artifacts and at most during 1 cycle the duty cycle may be between the original and the updated one.

It's a nice idea. I had been thinking on something similar but just to send pulses, without a specific duty cycle.
I have also used dma to do real hardware PWM in a timer and works great, which each value in the buffer representing the duty cycle for 1 PWM pulse. That's for audio so each pulse needs a different one. But for something that needs a certain frequency generated in multiple pins with different duty cycles I think your idea is great.

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Fri Jul 07, 2017 2:33 pm

victor_pv wrote:
Fri Jul 07, 2017 2:13 pm
Why do you use 2 buffers with the capacity of resolution?
Shouldn't 1 be enough since you are using circular mode?
My guess is that's so you can update the PWM duty cycle in one while the DMA is sending the other, but you could also update the duty cycle in the one being sent, and avoid getting artifacts if you are updating the values that the DMA is sending, but if you fill it from the top down for updates I think you shouldn't get artifacts and at most during 1 cycle the duty cycle may be between the original and the updated one.
Oh very interesting, I was plainly assuming that I will get into serious issues if I get into a race condition on concurrent access of the same memory. Actually, I have no idea what might happen, do you?
If thats no issue, could you explain a bit more why filling from top is better here?

If my meassurement is accurate, the fill process takes about 50us, so lets say less than 20 steps. Not sure how long the jump to isr takes, but that means the buffer fillup will probably overtake quite soon, giving that the above is a valid situation.

Ollie
Posts: 185
Joined: Thu Feb 25, 2016 7:27 pm

Re: SoftPWM via DMA and no CPU cycles

Post by Ollie » Fri Jul 07, 2017 3:02 pm

This is very relevant technology. In practice, it is the only way to implement the Dshot600 and Dshot1200 digital communication with the Electronic Speed Controllers for the BLDC motors used in fast copters. The classic PWM is analog and quite slow - it used to be 20 ms, but it is still limited by the servo signal definition of 1 - 2 ms. In addition of being fast and accurate, the Dshot has error detection feature that makes it very robust.

Cheers, Ollie

victor_pv
Posts: 1643
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Fri Jul 07, 2017 3:44 pm

universam10 wrote:
Fri Jul 07, 2017 2:33 pm
victor_pv wrote:
Fri Jul 07, 2017 2:13 pm
Why do you use 2 buffers with the capacity of resolution?
Shouldn't 1 be enough since you are using circular mode?
My guess is that's so you can update the PWM duty cycle in one while the DMA is sending the other, but you could also update the duty cycle in the one being sent, and avoid getting artifacts if you are updating the values that the DMA is sending, but if you fill it from the top down for updates I think you shouldn't get artifacts and at most during 1 cycle the duty cycle may be between the original and the updated one.
Oh very interesting, I was plainly assuming that I will get into serious issues if I get into a race condition on concurrent access of the same memory. Actually, I have no idea what might happen, do you?
If thats no issue, could you explain a bit more why filling from top is better here?

If my meassurement is accurate, the fill process takes about 50us, so lets say less than 20 steps. Not sure how long the jump to isr takes, but that means the buffer fillup will probably overtake quite soon, giving that the above is a valid situation.
The bus will arbitrate access. Since neither the DMA at this frequencies neither the CPU writing to a buffer will be able to deplete the bandwidth of the RAM, there should be no issue. If it did happen that the DMA and the CPU try to access the memory at the exact same time, the bus will split access at 50% for each. So an access request from one of them may be hold for a cycle or so to complete the other's transaction. Racemaniac wrote a separate thread where he pushed the limits of the CPU and DMA, doing multiple transfers at the same time. He could clog it using 2 SPI ports at full speed + Mem2mem DMA access + the cpu doing something else. I think in his results the mem2mem access (which goes as fast as the ram can go) and 1 SPI port at full speed woudl still run fine without affecting the CPU much, but anyway check that thread for the details. The result is that unless you were running the DMA at full 72Mhz you are not likely to have any problem at all.

About filling from top to bottom, I was thinking on a situation in which the period is changing and the DMA can flip the same bit twice in the same period. But as I was writing and example, I realized filling from top to bottom could have the same effect only in a different situation, but still cause the same signal to flip twice in the same period. So it would not be a solution.

Double buffer would avoid that. The other option is use a single buffer, but update half of it at a time. But that's just the same as you do, only with each buffer taking half the period. If there is enough ram, there is no advantage on each buffer having half the period.

victor_pv
Posts: 1643
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Fri Jul 07, 2017 3:47 pm

Ollie wrote:
Fri Jul 07, 2017 3:02 pm
The classic PWM is analog and quite slow - it used to be 20 ms, but it is still limited by the servo signal definition of 1 - 2 ms.
What do you mean that the classic PWM is analog and slow? are you referring to STM32F1 or something else?

victor_pv
Posts: 1643
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Fri Jul 07, 2017 4:02 pm

universam10 wrote:
Fri Jul 07, 2017 12:29 pm
Looks like Timer2.setPeriod() does a bit weird stuff if it comes below 4.

With

Code: Select all

   Timer2.setPrescaleFactor(F_CPU / RESOLUTION / FREQUENCY);
    Timer2.setOverflow(1);
the pulse works down to 1.7us on the F1 which is 2300Hz at 8bit resolution.
Is the problem related to the timer, or is it perhaps due to fast rate of ISR and the time it takes to fill the buffer?

Post Reply