c++ function optimization using vectorization or SIMD???

Orvillain

I've got a bunch of functions, but here is one example:

void computeSincKernel(float* kernel, double frac, double delta)
    {
        float deltaNorm = (delta - minDelta) / (maxDelta - minDelta);
        float deltaPos = deltaNorm * (deltaResolution - 1);
        int d0 = static_cast<int>(deltaPos);
        int d1 = std::min(d0 + 1, deltaResolution - 1);
        float dt = deltaPos - d0;

        float fracPos = frac * sincResolution;
        int f0 = static_cast<int>(fracPos);
        int f1 = std::min(f0 + 1, sincResolution - 1);
        float ft = fracPos - f0;

        for (int i = 0; i < sincSize; ++i)
        {
            float w00 = sincTable[d0][f0][i];
            float w01 = sincTable[d0][f1][i];
            float w10 = sincTable[d1][f0][i];
            float w11 = sincTable[d1][f1][i];

            kernel[i] =
                (1.0f - dt) * ((1.0f - ft) * w00 + ft * w01) +
                 dt       * ((1.0f - ft) * w10 + ft * w11);
        }
    }

This runs for every sample, so is quite heavy. How would you optimize this? I'm getting about 18% for 15 voices on my M1 Pro chip.

My sinctable is precomputed on sample load. I have very good anti-aliasing, but the CPU usage is a bit too high. This function is the main suspect.

griffinboy

@Orvillain

Sinc is usually not used for realtime.
It's super brute force as an antialiasing method.
It's more popular nowadays to store waveforms in frequency domain using FFT, and to silence bins above Nyquist before inverse FFT. Either that or use filters to make mipmaps (multiple copies of your waveform at different pitches, with antialiasing filters applied, baked into the copies, play back the appropriate pre-antialiased file for the pitch) optionally doing so at 2x oversampling and using additional interpolation to remove aliasing that happens from extra processes that happen in Realtime.

Yes you can use SIMD, you can use the Juce Simd (look up the Juce docs there is a tutorial on their page for how to use it) also there is an old version of xSimd that comes with Hise, search the forum for xSimd and you'll see my posts about it where Christoph explains how to build a version of Hise that allows you to use xSimd in custom nodes (requires very little setup, just a single line of code before building Hise)

Bear in mind that you can't really use SIMD across voices in Hise because Hise does a lot of voice management stuff automatically outside of the c++ node (for example, each voice is already hooked up to play it's own process() function one after the other)
But you can certainly Simd inside of a single voice, like vectorizing a convolution antialiasing filter.

Orvillain

@griffinboy said in c++ function optimization using vectorization or SIMD???:

It's more popular nowadays to store waveforms in frequency domain using FFT, and to silence bins above Nyquist before inverse FFT. Either that or use filters to make mipmaps (multiple copies of your waveform at different pitches, with antialiasing filters applied, baked into the copies, play back the appropriate pre-antialiased file for the pitch) optionally doing so at 2x oversampling and using additional interpolation to remove aliasing that happens from extra processes that happen in Realtime.

Cheers dude! I was aware of this, but I wanted to see how far I could get with sinc. Turns out, quite far! I've got 22% CPU usage for about 30 voices now. Which isn't really super optimal, but it was a fun project.

That paper you linked me a while back - https://www.mp3-tech.org/programmer/docs/resampler.pdf - was what got me interested.

I think I understand the process you mean though, for the mipmapping approach. Something like:

Oversample original audio x2 (juce::dsp::oversampling can handle this)
Set up a root note
For mip-maps below the root note - lowpass and downsample (dsp::FilterDesign::designFIRLowpassWindowMethod then keep every 2nd sample)
For mip-maps above the root note - upsample and then lowpass (use the same oversampling approach here for the upsampling and then the same kind of FIR filter???)
Store each level, and then move on to the playback engine

I think that'd be the approach??

Playback engine-wise, I'd still need to have an interpolation method to playback notes in between the mipmap levels I would guess. Can Hermite cover this, or do I need to go polyphase still?

Christoph Hart

@Orvillain the wavetable synthesiser uses the mip map approach to get rid of upper harmonics with pretty good results so for me that's the best approach. with wavetables it's especially intriguing because you can just ditch the upper bins of the FFT data (which is a brickwall filter but since the wavetable cycle size is exactly the FFT length you don't get any side lobes or whatever it is called lol).

For mip-maps below the root note - lowpass and downsample (dsp::FilterDesign::designFIRLowpassWindowMethod then keep every 2nd sample)

Ignore mip maps below the root note - downsampling can be done in realtime without any artifacts - the goal is to remove aliasing that comes from playing pack samples that produce frequencies above Nyquist.

For mip-maps above the root note - upsample and then lowpass (use the same oversampling approach here for the upsampling and then the same kind of FIR filter???)

You need to lowpass before you upsample or the upsampling algorithm will create the alias effects.

griffinboy

@Orvillain

I actually use a FIR filter in my downsampling and for my sample interpolation!

At the end of the resampling paper you remembered, there is a list of coefficients for different convolution filters that have fantastic specs.

I recently optimized the approach taken in that paper and made a wavetable synth engine.
I used SIMD for the convolvers. The engine uses 0.16% CPU per voice, aliasing under -70dB

Christoph's response to this post has the right info. You only really need to worry about strong antialiasing for when you are pitching up a stored sample.

If you're interested in efficiency, look into fixed point math, start using that, as well as buffers that have a power of two length. This allows you to use bitmasks and bitshifts which can allow you to do various things cheaper, such as making sure your sample loops without using ifs or regular modulo (expensive) also fixed point is generally cheaper than float, use a 64 bit number to track where you are in the sample, the lower 32 bits for the integer part, upper 32 for fractional.

Ask chat gpt about all this if it's new to you, it can give you an easy to understand lecture.

Orvillain

Thank you guys! Lot of stuff to look into here! Appreciate the help. Will report back!

c++ function optimization using vectorization or SIMD???

27

1.8k

12.2k

106.2k