Layers from Transient Modelling

Transients, for our purposes, are short, sharp events in sound. Some sounds may have transients that are clearly correlated with the start of events, like notes, others may be textures that are formed of many contributing transients. Transient modelling has received a lot of attention both in audio coding circles, where the aim is to be able to compress signals more by handling transient and other parts differently, and in audio restoration, where the aim could be to remove clicks and replace them with an estimate of what 'should' be there instead. When we decompose a sound into a transient layer, we are working on the basis that these two layers combine by simple mixing, so that we can always recover our original signal.

Transient extraction is quite computationally involved compared to estimating a percussive layer, but is much more precise in its separation, and will provide much more isolated transients, whilst leaving 'everything else' in its residual layer. This gives us some different affordances. We could process our transient layer quite radically differently to the residual, for instance, without having to be wary of spectral artefacts. For example, we might decide to dynamic-range compress our transients and our remainder very differently before recombining, giving us new ways of shaping the onsets of events, or the detail of textures. Or we might treat the transients as a store of grains that we recombine into new patterns and textures.

When combined with sinusoidal modelling, we have the basis of a 'sines-transients-residual' model, where our original sound is decomposed into layers of 'archetypes' (sines + transients ) plus a residual, which for some sounds might be noisier material. If we apply the transient extraction first, we might find that the sinusoidal modelling is more successful, because it has less to cope with (although, for some sounds, the converse may be true!). Keeping these layers separate might also open up other forms of analysis. For example, it might be possible to derive useful information about temporal patterns from the transient layer (depending on the type of sound).