HISE Logo Forum
    • Categories
    • Register
    • Login

    Ring Buffer design

    Scheduled Pinned Locked Moved C++ Development
    26 Posts 4 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • OrvillainO
      Orvillain
      last edited by

      So yeah, what I've implemented in my ring delay are the following:

      Read Nearest: uses 1 sample. It is the worst quality in terms of aliasing and modulation smoothness. But the lowest CPU usage.
      Read Linear: Uses 2 samples. It has audible HF loss, the aliasing is poor, but the modulation is a level above nearest. CPU usage is still low.
      Read Cubic: Uses 4 samples. Somewhat noticeable HF roll-off, aliasing performance is fair, and the modulation is surprisingly smooth. Medium CPU usage.
      Read Lagrange3: Uses 4 samples. Better HF retention than cubic, but still rolls off. Aliasing rejection is okay, but does let some junk through. Smooth modulation. Medium CPU usage.
      Read Sinc: Has 16 taps, which means it takes 16 input samples around the read point and does 16 FMA's to produce one output sample. The passband is very flat, no HF loss. Aliasing performance is excellent. Modulation is excellent. CPU is relatively high - but here on my machine never spiked above 1%.
      Read Sinc Blended: Also takes 16 input samples, but because we have a blend it performs 32 FMA's to produce one output sample. The passband is very flat, no HF loss. Aliasing performance is excellent. Modulation is excellent. CPU is relatively high - but here on my machine never spiked above 2%.

      It will be interesting when I get this stuff onto an embedded platform like the DaisySeed (ARM Cortex based) and then I can see how it all works out CPU-wise.

      Musician - Instrument Designer - Sonic Architect - Creative Product Owner
      Crafting sound at every level. From strings to signal paths, samples to systems.

      1 Reply Last reply Reply Quote 0
      • OrvillainO
        Orvillain @griffinboy
        last edited by Orvillain

        @griffinboy said in Ring Buffer design:

        @Orvillain

        That's exactly it.
        And the cool thing about using a fir interpolator like this with multiple taps, is that you can SIMD it
        Process 4 taps in one SIMD instruction cut CPU significantly.

        Hise has XSIMD included.
        You don't need to include it you can just use it in a c++ node

        Oh but it has latency potentially to use this technique. So it's not good for certain uses where it's too fiddly to compensate latency in different places.
        But it's good for delay effects and for reading samples from a buffer at a new speed.

        Indeed! Indeed! I haven't got to any of this yet. Just using basic c++ loops without any SIMD or vectorization.

        I'm looking at this purely for chorus or modulation effects, delays, reverbs, and sample playback.

        The reason I've implemented all of these is because I want to experiment with 80's, 90's, 2000's, and modern delay and reverb approaches, and I want to embrace the "shittiness" of certain approaches, as long as they add to or enhance the character.

        Pretty archetypal example for me. I've got a £100 Boss RV-5 pedal here, and a £700 Meris MercuryX pedal. Both outstanding reverbs. The RV-5 is clearly less clever under the hood. You can hear the delays in the FDN every now and then, and it doesn't sound super high fidelity. But bloody hell is it cool!!! Way better than later Boss reverbs IMHO.

        Musician - Instrument Designer - Sonic Architect - Creative Product Owner
        Crafting sound at every level. From strings to signal paths, samples to systems.

        griffinboyG 1 Reply Last reply Reply Quote 0
        • griffinboyG
          griffinboy @Orvillain
          last edited by

          @Orvillain

          I've never used sync / fir interpolators in a reverb.
          I'm not sure how high the benefit would be.
          For delays and synthesis it's great though when you need low aliasing and low ripple.

          OrvillainO 1 Reply Last reply Reply Quote 0
          • OrvillainO
            Orvillain @griffinboy
            last edited by Orvillain

            @griffinboy said in Ring Buffer design:

            @Orvillain

            I've never used sync / fir interpolators in a reverb.
            I'm not sure how high the benefit would be.
            For delays and synthesis it's great though when you need low aliasing and low ripple.

            A typical approach to decorrelating reverbs is to mildly modulate delay times in the FDN. But you can also do it on the input diffuser or the early reflections APF network too. So anything that affects delay time is ultimately going to make some kind of difference.

            It'd be interesting to compare them all. Maybe that's a new thread in a few weeks or so!

            Musician - Instrument Designer - Sonic Architect - Creative Product Owner
            Crafting sound at every level. From strings to signal paths, samples to systems.

            1 Reply Last reply Reply Quote 0
            • OrvillainO
              Orvillain
              last edited by

              @griffinboy

              So I was reading about mirrored ring buffers. For a normal ring delay, we have a circular buffer of size (n) - a write pointer advances and wraps back around when it reaches the end. A read pointer advances at an offset to the write pointer to create the delay amount. This creates the 1-tap delay effect.

              Fractional reads require that sometimes the window of samples required for interpolation crosses the end of the buffer. That window not being contiguous in memory means we need to modulo to access all of the indexes required for the fractional read. This costs CPU for every read operation.

              The mirror trick:

              Instead of storing [0 ... n-1] we allow a buffer twice as large. Every time we write a sample into w, we also write it into w + N. This results in a mirrored copy of the entire signal.

              Even though we allow twice the buffer, our real delay line is still only the single length. Meaning when we read, we read in a straight block. So the interpolator doesn't need to check the boundaries or even do an & mask operation for every tap.

              I've not yet implemented it, but think I grok the theory behind it. Costs extra memory, but can be much faster for doing multi-tap fractional interpolation, because you avoid modulo and branching in the inner loop.

              I'm going to implement it as a constructor boolean I think, so that my RingDelay class is flexible.

              Musician - Instrument Designer - Sonic Architect - Creative Product Owner
              Crafting sound at every level. From strings to signal paths, samples to systems.

              Christoph HartC 1 Reply Last reply Reply Quote 0
              • Christoph HartC
                Christoph Hart @Orvillain
                last edited by Christoph Hart

                @Orvillain said in Ring Buffer design:

                because you avoid modulo and branching in the inner loop.

                If your buffer size is a power of two (which it always can be, just round it up), then modulo boils down to a bitwise AND operation with even the lowest optimization settings. This vastly outperforms the performance impact of having a bigger buffer, so I'm not 100% sure this is worth the hassle.

                OrvillainO griffinboyG 2 Replies Last reply Reply Quote 1
                • OrvillainO
                  Orvillain @Christoph Hart
                  last edited by Orvillain

                  @Christoph-Hart Ahh interesting!

                  Even on embedded systems like ARM Cortex (I'm using the DaisySeed chip) ??

                  Also yes, I enforce power of 2 buffer at all times.

                  Musician - Instrument Designer - Sonic Architect - Creative Product Owner
                  Crafting sound at every level. From strings to signal paths, samples to systems.

                  Christoph HartC 1 Reply Last reply Reply Quote 0
                  • griffinboyG
                    griffinboy @Christoph Hart
                    last edited by griffinboy

                    @Christoph-Hart
                    @Orvillain

                    Yep, if your buffer is a power of two you can use bitmasking.

                    The only case where I've used other strategies, is when my interpolator is very wide. My Fir convolution interpolator has 64 reads every sample, and so even bitmask wraps add up.
                    But in such situations I actually just use strategies to hoist the wraps. I don't do any checking at all inside the loop. I note down where the edges are and if we are nowhere near the edges we run a separate vectorized loop that avoids any checks.

                    1 Reply Last reply Reply Quote 1
                    • Christoph HartC
                      Christoph Hart @Orvillain
                      last edited by

                      @Orvillain that's such a basic optimization technique that I would expect any compiler made after 1975 will incorporate this.

                      If you don't trust that, you can easily do this yourself:

                      buffer[i % 4096]
                      buffer[i & 4095]
                      

                      these are equivalent statements with integer numbers, one costs 88CPU cycles, and one is for free (~1 CPU cycle).

                      If you want a good read, this is the quasi standard reference for comparing CPU instructions. You need to dig around a bit to find what you're after, and again, you're basically doing yourself what the compiler will do for you already so it's purely educational:

                      https://www.agner.org/optimize/instruction_tables.pdf

                      the assembly instruction for division (and modulo) is IDIV which clocks at 88 CPU cycles. AND takes a single CPU cycle.

                      OrvillainO 1 Reply Last reply Reply Quote 2
                      • OrvillainO
                        Orvillain @Christoph Hart
                        last edited by

                        @Christoph-Hart Thank you!

                        Musician - Instrument Designer - Sonic Architect - Creative Product Owner
                        Crafting sound at every level. From strings to signal paths, samples to systems.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post

                        35

                        Online

                        2.0k

                        Users

                        12.6k

                        Topics

                        109.5k

                        Posts