Monolith / HLAC should have normalization built in
-
Since HLAC only supports 16-bit and the fact that you're doing monolithic files (inaccessible audio) you might as well do the next thing: before convering from 24 or 32 bit down to 16 bit, normalize everything individually. Then in the samplemap you can undo that normalization with volume control.
I can do this with SALT, and will do it if you don't add this feature, but figured it should be built into HISE somehow.
-
Yes I thought about that too - this would definitely save some hassles in the future...
-
Actually this might not be such a good idea, I think I'd want to (and do want to in Kontkat) define audio levels across maps/zones so something might be quiet on purpose. Of course if you mean examine everything in a map and normalise based on the loudest element then yeah sure fine, but if you mean normalise each zone -- bad bad bad.
-
I was thinking of going a level deeper and build the normalization directly into HLAC. Consider this example:
| 24bit | 0000 0000 0011 1111 1111 1111 | 16bit |
This would be a signal with a peak of -60dB (10 zeroes at the beginning * 6dB). On 24bit you have a S/N ratio of 14*6 = 84dB. However if you convert this signal to 16dB, you discard the last 8 bits and end up with a S/N with 36dB, which is the same sound that you'll get if you put a bit crusher on the sample and set it to 6bit.
Now what I am going to do is to "normalise" the signal in 6dB steps (which means to just move the bits to the left) before the conversion. In this example, it will move it like this:
| 24bit | 0000 0000 0011 1111 1111 1111 / before normalisation 0011 1111 1111 1111 0000 0000 / after normalisation | 16bit |
It just uses the bits that are available (note how it only shifted by 8 bits not by 10). Now when we convert the sample to 16bit literally no information is lost (this will be the case for all samples that have a peak < 48dB).
This way you end up with the full 96dB dynamic range of 16bit independent of the volume of the sample and your library can be marketed using the following claim: True 96dB Dynamiczzz powered by HLAC :)
The normalisation value (a simple 8bit
unsigned char
, in our case8
) will be stored in the metadata header of the HLAC file and will be inversely applied during decompression (which comes with no performance overhead because its just a simple bit shift). This leaves the Volume parameter in the sample map unaffected and requires no manual normalisation of the input signal (of course you can still use the normalisation feature in HISE)... -
Because of the built in normalisation is it better to use 24bit samples to start with?
-
Actually I haven't implemented this yet, so right now the last 8 bits get truncated (almost all libraries I am currently working on are normalising the samples so I don't have a strong incentive to implement this in the near future).
You can leave your samples at 24bit as long as you want, they will get automatically converted to 16bit when you export the sample map as monolith.
-
What kind of normalisation are the libraries that you're working on using, is each sample normalised independently to 0dB or are they all normalised to a common value? Does it make a difference if I use HISE's normalisation feature or normalise the samples before I import them?
-
Yeah, all 0dB (or a bit less). I actually find it far more consistent to have them all at 0dB and then add dynamics via VelocityModulators (or other modulators). And you definitely need to normalize the samples before importing, the HISE normalisation feature is just a non-destructive gain value.
-
I spent yesterday doing lots of experiments with normalisation. I'd just convinced myself that normalisation wasn't worthwhile. Today I did a bunch more tests and with your comments I've completely changed my mind :)
But is there an issue with noise when normalising samples of different dynamics to 0db? since quieter samples will have their volume increased more than louder samples.
And what role does normalisation play when converting from 24bit to 16bit in HISE?
-
Well, if you normalize them, you need to attenuate them using modulators and then the noise floor goes back to normal, so this shouldn't be an issue.
RE 24bit vs. 16bit, if you normalize before converting, you make sure that most of the signals dynamic range is preserved. Take a look at this example:
0000 0011 1111 1111 1111 1111
This is supposed to be a signal with 18 bits so in 24 bits it has a peak value of -36dB (the formula is
=(24-18)*-6
) and a SNR of 18*6 = 108dB. If you convert it to 16bit, you'll get this:0000 0011 1111 1111
It just truncates the last 8 bit. Of course with a good dithering algorithm it does more than just that, but this is easier to understand. Now we still have a peak value of -36dB (the six leading zeros), but the SNR was reduced to 10*6 = 60dB. This might be too low and we can start hearing quantisation noise if the dithering was bad and we run it through a compressor or other dynamic processors.
But if we normalize the signal before converting, we'll get this signal:
1111 1111 1111 1111 11XX XXXX
It just shifts the bits to the left (actually it's a bit more complicated, but again, it's easier to understand). The
X
is new material and if the algorithm is not totally bogus it is most likely zero :) Now if we convert this signal to 16bit, we'll get this:1111 1111 1111 1111
As you can see, we'll just loose two bits. The SNR is 16 * 6 = 96dB (instead of 60dB). This is the benefit of normalizing before converting the bit-depth.
-
Thanks for the explanation, that makes it much easier to understand. So I shall normalise to 0 and drop the gain back down in HISE.
-
This post is deleted! -
Resurrecting this topic because normalisation is something I keep coming back to.
Here's my current problem. I have a sample recorded on three mics, when normalised the close mic gets boosted +38dB while the other two mics are only boosted by about +30dB (these are for ppp dynamic samples).
So if I bring my samples into HISE and merge the multi-mic I could drop the volume of each mic accordingly using a simple gain mod, not a problem for this simple example. However that is only one sample of one dynamic, when I look at a higher dynamic the close is boosted by +20, the decca +3, and the hall mics +9. So now dropping each mic overall no longer works. Even if I could control the volume of each mic for each zone it would take hours to get the volumes where they should be. :(
What about if I could put the volume boost into the sample meta data and HISE could read it and restore each sample to it's original volume?