DFD Performance + Efficiency

zircon

Hi, happy to be here and see development of what looks to be a very promising tool.

I think that achieving high performance, stability, and efficiency is the #1 challenge. Mach5 is an example of a plugin with an incredible engine and tons of features, far more than Kontakt. But it has failed to achieve traction because, all things being equal, it suffers from higher CPU usage and worse efficiency when streaming samples.

A good test would be to take a pool of samples in the 1-5mb range with a DFD buffer of ~100kb and to play very high polyphony parts. Then compare the performance to Kontakt using the same samples. No scripting or effects - just pure streaming performance.

Next, testing large numbers of zones + groups - Kontakt can handle tens of thousands of loaded zones and 1000+ groups. This sampler needs to be able to match or beat that performance, and not choke with such large amounts of data. This may require approaching the design of the instrument at a low-level differently, so I wanted to bring it up now when development is earlier on.

Christoph Hart

Hi Zircon,

welcome to the party

I totally understand what you mean (and I had a discussion with Elan about the same subject). I don't know Mach5 but KONTAKT is and will be the standard that the performance has to be compared with.

Measuring performance & improving the speed is one of the hardest things to do - there are many variables that have to be taken into account - OS version, 64bit vs. 32bit, CPU type (AMD or Intel), and especially for the streaming performance, hard disk drive.

In its current state, HISE is slower than KONTAKT (and I don't think this will be changed in the near future). KONTAKT is maybe the most used sound engine in the world and NI has an armada of skilled C++ coders which improved the codebase for over ten years.

However, I designed HISE and the underlying engine with performance in mind - so every now and then I profile the code and eliminated hot spots. Especially the streaming engine is quite capable (a ballpark figure of my system would be 80%-90% of KONTAKT's performance). Basically it is not or only little more CPU demanding than the sine wave generator...

Also by being open source I hope that the framework will catch up faster regarding stability and performance (there are many more experienced coders out there who might contribute to it).

But there is one big advantage for HISE compared to KONTAKT: When KONTAKT was designed, SSD drives weren't existing and so their streaming engine had to cope with the deadly slow spinning disc hard drives and their horrible latency.
With the current prices and the availability of SSDs, you can assume that everybody that wants to use sample libraries professionally will play them from SSD, so a streaming system written with mostly SSD specs in mind is able to take more advantage of this technology.

Also, HISE only uses "Linear Interpolation" for its resampling - until somebody has an actual audio example where you can hear a difference between the resampling algorithms for sample library based instruments, I think using the most simple one will allow the best bang for the buck (performance vs. audio quality).

Have you made some tests with HISE?

elanhickler

working on it…

elanhickler

Ok, so talking to Greg @ OrangeTreeSamples we came up with one scenario for a guitar sample library and we are wondering if this approach would be inefficient. Here is the setup.

1 Master/instrument container
15 Articulation sub containers
15 * 6 string samplers + 2 misc/fx samplers = 92 sampler modules
92 samplers * 4 groups = 368 groups (total) spread between 92 samplers
Christoph, if you will, please answer the questions below:

Guitar
	Sustain
		String 1: Upstroke x2, Downstroke x2 for all strings (using groups)
		String 2
		String 3
		String 4
		String 5
		String 6
	Palm Mute Half
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Palm Mute Full
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Mute
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Squeal
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Tapping
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Squeal Release
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Palm Mute Release
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Sustain Release
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Hammer-on
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Pull-ff
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Slide-up
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Slide-down
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Natural Harmonics
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	Resonance
		String 1
		String 2
		String 3
		String 4
		String 5
		String 6
	FX
	Pick Noise

1. Is 92 sampler modules going to be inefficient for HISE?
2. Can you use a master midi script to control which samplers receive midi?

Christoph Hart

@cddm41xw:

1. Is 92 sampler modules going to be inefficient for HISE?

We'll have to see (it would be an interesting benchmark). The design seems reasonable but I didn't stress HISE before with this kind of sampler amount (I had about 150 modules for the clarinet with 10-15% CPU on my i5 Macbook Pro but it included Modulators and ScriptProcessors). But they won't play all at once and if they are sitting around doing nothing most of the time, they won't waste any performance…

It depends how many modulators and envelopes you need in each sampler. But you absolutely need to reduce the voice amount here, because 92 samplers with 64 voice buffers would mean 368MB memory only for the intermediate buffers. But for this use case you actually want monophonic behaviour (because every string is monophonic, right?), so I would suggest to reduce the voice amount to 4-5 per sampler (to give them some space to gracefully ring off when retriggered). 4 voices per sampler would mean 23MB of intermediate buffers and this seams reasonable.

@cddm41xw:

2. Can you use a master midi script to control which samplers receive midi?

Yes of course. I would implement it by changing the MIDI channel of the incoming MIDI event in a root container script for every incoming message depending on which string you want to play (with Message.setChannel(1-6);

In each sampler you will then only need a MIDI channel filter, which is a hardcoded MIDI module (I think 92 ScriptProcessors would be overkill)

elanhickler

HISE patch: http://pastebin.com/aJSwn1Qn
reaper project: http://www.elanhickler.com/transfer/HISE/saw_ens.rpp (single instance of HISE, multiple samplers)
reaper project: http://www.elanhickler.com/transfer/HIS ... _multi.rpp (multiple instance of HISE, single sampler)
samples: http://www.elanhickler.com/transfer/HISE/saw_ens.rar
kontakt: http://www.elanhickler.com/transfer/HISE/saw_ens.nki

This is my performance test for Kontakt and HISE and I have found that Kontakt outperforms HISE only because Kontakt seems to have a built-in multiprocessor handling. If I set Kontakt multiprocessor support to OFF, then HISE and Kontakt seem to behave the same. HOWEVER, if I use multiple instances of HISE rather than one instance, that basically recreates Kontakt's multiprocessor handling. Now, I didn't yet take a detailed reading of CPU usage to figure out which one is slightly more efficient, but at this point it's looking like it doesn't really matter in the big scheme of things. The fact is HISE and Kontakt can both handle thousands of voices without killing my CPU. If you're not running an entire orchestra in one HISE instance, I think HISE will do very well for the most demanding commercial products. To dismiss HISE just because it doesn't have kontakt's multiprocessor features would be a mistake. How many users actually load an entire orchestra in one instance of Kontakt? Pretty sure most people use a few instances at least.

Reasons not to dismiss HISE:

1. HISE is free for users, unlimited customer base for your commercial products.
2. HISE, as I can tell, has far greater sample library developer support than Kontakt both in terms of modability and willingness of Christoph to add features.
3. I could see HISE being more efficient than Kontakt in certain circumstances because you aren't dealing with hacky implemenations like you do in Kontakt.
4. HISE is a no-brainer if you want to release free/cheap/semi-big projects/solo instruments. REALLY big projects? That's yet to be decided, but that's the only thing yet to be decided.
5. HISE will be able to make 100% use of phaselocking technology. The expression you will be able to achieve with solo instruments will… IMO... change the face digital orchestral composition, it will be a new paradigm in sampling.

Christoph Hart

Happy to see the results (right now I can't download the samples to check it on my system).

About the multithreading stuff: the audio callback gives you a buffer and wants it back immediately or as soon as possible. There is no time to give the task to other threads which can be executed on other CPUs without introducing at least one buffer of latency - this is what I know, maybe I am wrong on this.

But as Elan said, having a super-duper multithreaded engine is only important if you hit performance bottlenecks with one instrument. This is rarely the case because the air gets thin only if you use orchestal templates with big amounts of instruments - todays computers should handle one sample library just fine and I know many composers who are willing to design their template for the best possible performance.

Christoph Hart

Alright, I downloaded the patch and checked the performance. Hacking on my keyboard until I get 400 voices yielded about 40% CPU (Macbook Pro 2011 i5)

Since this seemed rather high I started profiling and optimizing the streaming engine (by applying some SSE optimizing) and reduced the CPU load to 25% (that is about 40% faster). The profiler now shows the hotspots in copy and gain multiplication instructions, which can't be further optimized so I think this should be fast enough for now.

MIDIculous

Would Hise work for other audio formats such as Flac or CAF?

Christoph Hart

Probably yes (I'll have to change a few lines of code) but even then with decreased performance because there is no memory mapped file reading for those formats in JUCE and implementing it myself is very complicated and not on my priority list.

DFD Performance + Efficiency

22

1.8k

12.0k

104.5k