How hardware-modelling audio plugins are (usually) made

It starts with finding an audio effect you want to model, and then it's all about measurement. You send different signals into the device, and look at the frequency spectrum of the output.

If you want to model a device with a lot of detail, you can use circuit modelling. This involves implementing each component of the circuit that imparts an audio processing transformation as code. Every tube, transistor, and capacitor will be taken into consideration.

That will get you really close, but will never result in a perfect emulation. That's because there is only so much you can do with software code to model real world transformations.

How machine learning can help

Instead of measuring and implementing individual components of the device we are trying to model, we use machine learning to learn the effect on the audio signal that the processing device has, without any code approximations.

So how do we do it?

We send all types of signals - instruments, vocals, test signals - through audio processing systems - hardware device, a software process, or combinations of both - and record the output. These transformed signals are what we train our deep neural nets on, in order to produce our models. These models are trained on audio alone, without any custom signal processing code, and are capable of processing user audio in real time.

Why does that matter?

DNNs excel at modelling nonlinear interactions, with exceedingly complex input and output relationships. This type of interaction is present at almost every stage of audio capture and reproduction; tube preamps, signal compressors, noise-reduction, bit depth transformation – these are all examples of complex nonlinear interactions that have important perceptual effects.

Ok, but isn't machine learning supposed to be really futuristic and capable of new things? What is different here?

What's different is that the processing is being applied entirely by a deep neural net, in real time. When turn a knob in one of our products, you are not controlling a software parameter that was implemented in code - you are "conditioning the response of the model with a paramater". Changing the response of our models based on more complex conditioning is the key to unlocking the next generation of audio processing interactions.

So stay tuned!