HLAC Compression for CH1 files - not working out so well...

Lindon

OK get the feeling the project I'm working on is going to really tax HISE....

So I have (one of many) sample map with 27 round robin groups....

There's a total of 4,239 wav files in there, mapped across these groups...

They are all looped vocal samples...

and on disk (as wav files) they add up to 2.54 GB

So I've built this sample map in two different ways:

First I loaded up each of the round robin groups with the required sounds - now some of these groups are duplicates of others - and they are duplicated so they match up with RR groups in other sample maps...their siblings if you will.

Once built I then exported these as HLAC and got 2 ch1 files as a result:

testmap.ch1 -- 2,036,988 KB
testmap.ch2 -- 1,405,831 KB

Thats 3,442,819 KB, or 3.25 Gb

So for a start this looks odd - they are BIGGER than the wav file set they are supposed to be compressing....

OK maybe the HLAC compression isnt doing the sensible thing and only compressing a given wav file the once...

So I built the map again only this time removing all the duplicate groups, then back to HLAC and ch making and I get:

reducedmap.ch1 -- 2,036,966 KB
reducedmap.ch2 -- 182,746 KB

OK so that's different - now I have 2,219712 KB in total for a Gb of 2.1Gb -- its not much compression either, less than 20%

So two questions:

Whats happening with that first scenario? Is it really not compressing each wav file just the once?

Is it my audio data(the human voice) thats just "hard" for HLAC?

In all cases by the way it took an AGE for HISE to compress these files (about 30 mins each) so tis doing a LOT of something...

Kontakt can get every single bit of audio so about 40% more than I'm using here -down to "only" 1.8Gb in ncw format...

Christoph Hart

Hmm, there is a Duplicate flag in the samplemap XML that marks samples that are referenced more than once, and the HLAC encoder should skip these. Is this flag set in your samplemap?

Is the source material 24bit and have you enabled TrueDynamics?
The HLAC codec (and every other lossless audio codec) works best on decaying material because it can reduce the bit depth required to store the signal. For samples which are normalised and sustaining, the compression ratio is the worst, but it should at least yield a compression ratio of 70% (so your 2.54GB files should go to something like 1.8GB).

Lindon

@Christoph-Hart said in HLAC Compression for CH1 files - not working out so well...:

Hmm, there is a Duplicate flag in the samplemap XML that marks samples that are referenced more than once, and the HLAC encoder should skip these. Is this flag set in your samplemap?

Is the source material 24bit and have you enabled TrueDynamics?

The HLAC codec (and every other lossless audio codec) works best on decaying material because it can reduce the bit depth required to store the signal. For samples which are normalised and sustaining, the compression ratio is the worst, but it should at least yield a compression ratio of 70% (so your 2.54GB files should go to something like 1.8GB).

yes the duplicate flag is set:

<sample ID="30" LoVel="6" HiVel="15" FileName="{PROJECT_FOLDER}CT Ee_32.WAV"
          LoKey="84" HiKey="84" SampleStart="0" SampleEnd="132650" LoopEnabled="1"
          LoopStart="19820" LoopEnd="65674" Pitch="0" Root="84" Volume="-6"
          Pan="0" RRGroup="5" Duplicate="1" MonolithOffset="541708288"

..and yes 24-bit audio

File Name Blue Vowel Ar_08.wav
File Size 659 kB
File Type WAV
File Type Extension wav
Mime Type audio/x-wav
Originator Pro Tools
Originator Reference 8QTvSdnSqKoaaaGk
Date Time Original 2014:12:11 19:26:33 Time
Reference 1959370 Bwf
Version 0
Encoding Microsoft PCM
Num Channels 1
Sample Rate 44100
Avg Bytes Per Sec 132300
Bits Per Sample 24
Cue Points (Binary data 28 bytes)
Duration 5.10 s
Category audio

I've also tried to use "True Dynamics" with very little change...

Lindon

@Lindon -interestingly - yet more sfz problems...

Loading an sfz now captures the lops but sets Duplicate ="1" in all cases...

<sample ID="1" LoVel="46" HiVel="50" FileName="{PROJECT_FOLDER}CT Boot Boo FIXED_01.WAV"
          LoKey="53" HiKey="53" SampleStart="0" SampleEnd="158448" LoopEnabled="1"
          LoopStart="23385" LoopEnd="140829" Pitch="0" Root="53" Volume="-6"
          Pan="0" RRGroup="1" Duplicate="1"/>

There are no duplicates in this sfz....

Lindon

@Lindon OK more feedback - I went through an laboriously hand-edited the sample map to make sure all the entries were all correct - took me 3 hours...there were all sorts of strange duplications and anomalies - I still say this whole sample map making needs a serious work over -- and ran the compression again - and now I get a (very nice) 1,732,780 Kb - or 1.6Gb - so massive improvement...now all I have to do is go back and do the same to the remaining sample maps--- possibly days of work...

David Healey

@Lindon Which part are you hand editing? The loop points?

Lindon

@d-healey - no they are coming over fine now - Duplicates="X" is often wrong, and there are silly amounts of duplicated entries - so exactly the same meta data for a single file

duplicated elsewhere (nowhere near it) in the sample map, but with its duplicate="1" set ...

David Healey

@Lindon Could you send me over one of the files so I can experiment to see if I can find a way to clean it up more quickly?

Lindon

@d-healey well I just wrote over them!!! Let me see if I can find an original "broken" one...

Christoph Hart

@Lindon said in HLAC Compression for CH1 files - not working out so well...:

possibly days of work...

Nah, we'll make a fix for that one. What if there is a simple function that runs over the samplemaps and fixes the duplicate ID flag? This flag is not being used super prominently, so there might be many occasions where it gets corrupted (eg. if you delete a duplicate sample making the other one non-duplicate, etc).

Lindon

@Christoph-Hart said in HLAC Compression for CH1 files - not working out so well...:

@Lindon said in HLAC Compression for CH1 files - not working out so well...:

possibly days of work...

Nah, we'll make a fix for that one. What if there is a simple function that runs over the samplemaps and fixes the duplicate ID flag? This flag is not being used super prominently, so there might be many occasions where it gets corrupted (eg. if you delete a duplicate sample making the other one non-duplicate, etc).

I'd trade this for the ability to select an rr group and HISE actually import dropped audio into it - not into group1..

Christoph Hart

Have you tried using the scripting API calls to actually do the mapping? If you have a ton of samples to map, this is usually the way to go as you can apply Regex rules to extract almost every property of a properly named sample file.

Haters might argue this is the reason why the GUI based mapping is so sloppy - I never did it again after writing the scripting API for setting sample properties...

Lindon

@Christoph-Hart never thought of doing it that way - which scripting API calls do you mean?

David Healey

https://forum.hise.audio/topic/64/fun-with-regex/2

Lindon

yeah -- sorta -- I can see how this would be useful if (As Christoph says in the thread) all the audio had comprehensive and inclusive file names (that included low note, high note, lo velo, hi velo, rr group etc..) - but frankly I nearly never get audio in that sort of format... its all "hand-named" using some folder structure.. and oft times comes from some pre-existing Kontakt product - so via sfz at best ---- .so I'm stuck with the drag-and-drop interface --I can usually work a way around (again usually with some python scripting to build the xml ) but it'd be nice if I could drag a set of audio into a sampler in a given rr group and have it map there correctly - but it never ever does.

Christoph Hart

If it's of any help, the folder name is also part of the sample's FileName property.

Another helpful tool is a batch file rename utility - usually the samples that I work with have "some" sort of useable info in the filename which might be massaged into something that the regex processor can cope with. This is the way to go to clean up inconsistencies before the import process (and your samples might thank you anyways for getting more meaningful names along the process).

David Healey

@Christoph-Hart said in HLAC Compression for CH1 files - not working out so well...:

Another helpful tool is a batch file rename utility -

This is my solution to badly named samples too. I use PyRenamer

Lindon

@d-healey @Christoph-Hart -- oh trust me this is a life saver for me (well MASSIVE time saver at least):

https://www.bulkrenameutility.co.uk/

Christoph Hart

yup, that's exactly what I'm using :)

HLAC Compression for CH1 files - not working out so well...

11

2.4k

13.8k

119.9k