A faster ringbuffer implementation

Post your cool example code here.
C_D
Posts: 62
Joined: Mon May 11, 2015 3:27 am
Location: New Zealand

Re: A faster ringbuffer implementation

Post by C_D » Wed Oct 04, 2017 11:54 pm

Rick Kimball wrote:
Wed Oct 04, 2017 3:54 am
You just need to provide the POP_T data type and make it the same as the T data type.

I used this "feature" to my advantage using uint8_t as the data type but returning a int16_t with an empty value of -1.
Aha, I knew there had to be a reason for it, I just couldn't think of any reason why the return type would be different to the stored type. That makes more sense now!
Rick Kimball wrote:
Wed Oct 04, 2017 3:54 am
BTW: You should probably call values.clear() in your constructor so it will work as a local variable on the stack.
Is that just ensuring that the structure is initialised to 0 when it is created?

Also, why is the capacity one less than the actual size of the buffer? It really slows down my moving mean code if the number of actual elements is not a power of two because you have to do actual division not just a bit shift.

User avatar
Rick Kimball
Posts: 1037
Joined: Tue Apr 28, 2015 1:26 am
Location: Eastern NC, US
Contact:

Re: A faster ringbuffer implementation

Post by Rick Kimball » Thu Oct 05, 2017 3:45 pm

C_D wrote:
Wed Oct 04, 2017 11:54 pm
Is that just ensuring that the structure is initialised to 0 when it is created?
Yes. However this is only a problem if it instanced on the stack. If you make it a global variable the .bss section gets zeroed out at startup.
C_D wrote:
Wed Oct 04, 2017 11:54 pm
Also, why is the capacity one less than the actual size of the buffer?
I leave one slot open so I can detect when the element buffer is full without having to keep a count. There are different ways to implement a circular buffer. I used the approach that uses 2 indices and an element buffer. This minimizes code size and locking issues.

This code make a few assumptions:

1. There will be a single producer and a single consumer. Typically the producer is an interrupt routine getting data asynchronously and a main thread acting as the consumer. I initially created it for dealing with UART data.

2. I used a datatype of uint16_t for the head and tail indices. That limits the size of the buffer elements to (1<<16)-1 (65535). This isn't a problem on the chips I'm targetting. The stm32f103c8 only has 20k of ram total. However, it also allows me to do an atomic read of the both indices without locking as I stuff the two uint16_t values into a uint32_t union. The producer is the only one who is going to set the head index. The consumer is the only one who is going to set the tail index. This makes the code pretty safe if you ignore the case where it overflows. I leave that as an exercise to the reader. They will know best about how they are producing and consuming. They can decide to block or ignore on an overflow condition. I don't put that in the code.

3. I assume a buffer element size that is a power of 2 so I can use a mask to deal with buffer index wrap around instead of using a modulo operator. See my earlier post that shows how arm-none-eabi-gcc reduces the computation of the number of available elements to a single thumb2 instruction.
C_D wrote:
Wed Oct 04, 2017 11:54 pm
It really slows down my moving mean code if the number of actual elements is not a power of two because you have to do actual division not just a bit shift.
You are already checking to see if the buffer is full to decide if you need to throw out an element. Instead, you could always use a size of 16 for the ringbuffer and then check for an available() count of 8 instead of the the buffer.capacity(). That would allow you to use a shift divide. However, you still have to deal with the initial conditions when you have less than 8 samples using a normal divide instruction.
-rick

C_D
Posts: 62
Joined: Mon May 11, 2015 3:27 am
Location: New Zealand

Re: A faster ringbuffer implementation

Post by C_D » Thu Oct 05, 2017 8:39 pm

Rick Kimball wrote:
Thu Oct 05, 2017 3:45 pm
You are already checking to see if the buffer is full to decide if you need to throw out an element. Instead, you could always use a size of 16 for the ringbuffer and then check for an available() count of 8 instead of the the buffer.capacity(). That would allow you to use a shift divide. However, you still have to deal with the initial conditions when you have less than 8 samples using a normal divide instruction.
That's actually a not a bad idea, trading off speed for a small amount of wasted space. For my current application I am sampling relatively slowly so the integer division isn't a problem, but that's probably the best approach if I do need to speed things up further down the track, thanks 8-)

Post Reply