what are AI audio effects?

There are currently several music production tools using “machine learning” (ML) and “artificial intelligence” (AI), but what exactly are we trying to accomplish at TONZ?
We use deep neural networks (DNNs), a type of ML, to model audio transformations directly from audio data. We take a “dry” audio signal and a “wet” audio signal that has been processed in some way, for example compressed. Our neural networks learn the difference between these audio signals, resulting in a model that has learned the transformation. We build the trained models into audio plugin products which process frames of audio through the model, without any further filtering or processing of any kind. This means that customers can use our DNN models to transform their signals as if they had access to the studio quality devices that we model.

Why use AI for audio plugins?

Do we really need a new way to model hardware audio devices? Well, yes! The way we use DNNs to learn the differences from audio signals directly is unlike any other use of ML in music production. We don’t use AI to make creative human decisions, we use it to make building audio processing tools easier, and more accurate. The transformations imparted by real hardware devices are complex, and they're hard to get right when approximating with code. Musicians’ ears are rarely fooled, and our current efforts to model hardware devices fall short. We're using ML to get better at building products, not just to clone hardware devices, but to provide musicians with next generation of production and performance tools.
To find out more about how we are building hardware-modelling music product products, read below!

How to model a hardware device with DSP

It starts with finding an audio effect you want to model, and then it's all about measurement. You send different signals into the device, and look at the frequency spectrum of the output.
It starts with finding an audio effect you want to model, and then it's all about measurement. You send different signals into the device, and look at the frequency spectrum of the output.
If you want to model a device with a lot of detail, you can use circuit modelling. This involves implementing each component of the circuit that imparts an audio processing transformation as code. Every tube, transistor, and capacitor will be taken into consideration.
That will get you really close, but will never result in a perfect emulation. That's because there is only so much you can do with software to model real world transformations.

How to build AI audio plugins

Instead of measuring and implementing individual components of the device we are trying to model, we use machine learning to learn the effect on the audio signal that the processing device has, without any code approximations.
We send all types of signals - instruments, vocals, test signals - through audio processing systems - hardware device, a software process, or combinations of both - and record the output. These transformed signals are what we train our deep neural nets on, in order to produce our models. These models are trained on audio alone, without any custom signal processing code, and are capable of processing user audio in real time.

What are the benefits of using AI?

DNNs excel at modelling nonlinear interactions, with exceedingly complex input and output relationships. This type of interaction is present at almost every stage of audio capture and reproduction; tube preamps, signal compressors, noise-reduction, bit depth transformation – these are all examples of complex nonlinear interactions that have important perceptual effects.
Ok, but isn't machine learning supposed to be really futuristic and capable of new things? What is different here?
What's different is that the processing is being applied entirely by a deep neural net, in real time. When turn a knob in one of our products, you are not controlling a software parameter that was implemented in code - you are "conditioning the response of the model with a paramater". Changing the response of our models based on more complex conditioning is the key to unlocking the next generation of audio processing interactions.
So stay tuned!