Neural Amp Modeler - WaveNet on SNEX
-
I've noticed a new project, Neural Amp Modeler. This is an amazing realistic amp modeler based on Machine Learning (WaveNet).
https://github.com/sdatkinson/neural-amp-modeler
As I see it's under MIT License, could this can be added to Hise? At least to open the modeled amps. This could be a revolution.
If not, anyone knows about using WaveNet thing on SNEX?
-
@Steve-Mohican I had a working prototype using a Third Party Node & the RTNeural library. Simple transfer functions work (like a guitar amp) but I hit a wall trying to make a variational autoencoder so it's on the backburner for the time being. If I get it working properly I'll share all the code on the forum.
-
@iamlamprey said in Neural Amp Modeler - WaveNet on SNEX:
but I hit a wall trying to make a variational autoencoder so it's on the backburner for the time being.
I think this autoencoder is the thing that makes the sound ultra-realistic. A transfer function is similar to a static wave shaper. But this auto-encoder thing is amazing. A real guitar amp behaves like this!
As you see, the shape has been morphing automatically according to the input signal volume. This is not a standard waveshaper look.
-
Yeah Faust and other recent additions are amazing, but regarding the future compatibility, I think the Machine Learning integration is a must. AI used software are growing up everyday, at least we can use Machine Learning.
Machine Learning please...
-
@Steve-Mohican Actually a lot of SOTA neural amps just use multiple networks trained on different "snapshots" of the original, then crossfade between those functions at runtime
Keith Bloemer has some good articles on using time-series models to build guitar amps: https://keyth72.medium.com/
There's also Christian Steinmetz' work modelling things like the LA-2A:
https://www.youtube.com/watch?v=juH6mEArdU8
A generational model like a VAE is instead a large network trained on hundreds or thousands of examples, which it encodes and maps to a latent dimension to explore at runtime
VAEs are more suited to the virtual instrument/sample library side of things, you can theoretically train a generational model on 10000 Djembe one-shots, then infer any one of those (or anywhere in between) at runtime for seemingly infinite round-robin variation
RAVE 2 is the current state of the art for realtime generation, but it's quite complex and difficult to train, and if i recall correctly it can only take audio as an input for inferrence, not MIDI:
https://www.youtube.com/watch?v=dMZs04TzxUI
My plan was to make a VAE that takes the start of a sample (like from the HISE sampler, for example), then turns it into a variation of itself, it trained successfully but it was just inferring noise (time-series models confuse the heck out of me). Neural nets are certainly a candidate for the future of DSP and sampling, since a correctly implemented model would solve a lot of pain points for sample libraries (smaller file size, infinite round-robins, latent-exploration for realism or swapping "instruments" etc)
-
@iamlamprey Just amazing technologies. I hope neural networks come to Hise in the future.
-
I've been trying to tie up all of my partially finished projects so I can get going with WaveNet. Neural models are the future.... and the future is now.
-
@Dan-Korneff said in Neural Amp Modeler - WaveNet on SNEX:
I've been trying to tie up all of my partially finished projects so I can get going with WaveNet. Neural models are the future.... and the future is now.
Agreed. The time is now for the Neural Models.
-
I know nothing about this stuff, can someone give me a brief summary of what you're all talking about? :)
-
@d-healey this simple, 25 hour long video covers the basics
You get a basic outline in the first few minutes
-
@d-healey It's called Deep Learning for a reason
WaveNet: Feedforward network that learns to estimate a function from training data (one of the simplest networks), original paper from 2016
AutoEncoder: Takes a complex representation of data (usually a MelSpectrogram), simplifies (encodes) it, then learns to reconstruct the original complex representation from the encoded information instead. Useful for data compression, more useful when you make it variational
Variational AutoEncoder (VAE): Uses a fancy reparametrization trick to take each complex representation and "map" it to a latent dimension, the network can then decode from any arbitrary location inside that latent dimension. Think of it like burying treasure, you can bury one piece in the sandbox, one under the tree, one next to the car -- you then say "hey stupid neural network idiot, decode the stuff under the car" and it will give you (hopefully) exactly what was "mapped" to that location
Generative Adversarial Network (GAN): Uses one network (generator) to "trick" the other network (discriminator/critic), each network gets better at doing its job (tricking / not being tricked) until the generator gets so good at generating that it's indiscriminable, Deepfakes are a type of GAN iirc
Diffusion Model: Trains by taking a data sample and adding noise to it incrementally, it learns by understanding what the previous noise-step was, then you pass it a random noise value and it decodes something from that. Dalle and StableDiffusion are diffusion models, they aren't necessarily ideal for audio since they're quite slow at inferrence time
DDSP: differentiable digital signal processing, they basically train a neural network to control audio effects (or a synthesizer), by letting the network take the wheel it doesn't have to actually generate the raw audio data and instead just controls existng DSP tools
RAVE: realtime audio variational autoencoder, a fancy VAE that also includes a GAN stage at training. it's the current state-of-the-art for realtime synthesis and is embeddable on micro devices and such. I have no idea where to begin implementing it in hise as it's quite complex
Implementation: you basically need a fast library and some sort of time-series network, the former can be RTNeural (installing Tensorflow / torch is kinda annoying compared to the simple RTNeural library), the latter can be a LSTM, RNN, GRU or a convnet. It also must be trained (obviously)
all of these time-series models basically "predict" the next audio sample or buffer, based on the previous one(s). without temporal coherence it will just randomly predict each sample, which results in white noise (ie my problem in my VAE)
There's also guys like the dadabots dudes who generate full music instead of individual instruments/effects, they have a 24-hour metal livestream that is being generated in realtime which is really cool
you can find all sorts of tutorials on how to build simple neural nets on youtube using Python and a library like Keras. Be warned: you'll be looking at hand-drawn MNIST images for hours at a time
okay that's my last year and a half condensed into an ugly forum post