Non-negative matrix factorisation
BufNMF uses non-negative matrix factorisation (NMF) to decompose the magnitudes of a spectrogram into a number of components specified by the user. Each component is represented by the combination of two elements: a spectral template (“basis”, plural: “bases”) and an amplitude envelope (“activation”). Activations and bases can be used in various ways, including to resynthesize the audio of a single decomposed component. NMF is a popular technique in signal processing research for things like source separation and transcription.
NMF could be thought as a vocoder, in which each bandpass filter is instead a finely tuned spectral template, for which its respective activation is the envelope.SuperCollider Example Max Example
For an example of audio decomposition using NMF, see the Decomposition with NMF Overview.
NMF is a form of unsupervised machine learning. It begins either from seeded activations and bases or in a stochastic state (where the bases and activations of the components are initially random) and works iteratively. Using stochastic gradient descent, NMF tries to find a combination of components that, when summed together, minimize the difference from the original spectrum. In this implementation, the components are reconstructed by masking the original spectrum, such that they will always sum to yield the original sound.
There is no single correct solution to this process. There are different ways of accounting for an spectrum in terms of some set of bases and activations. NMF tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
BufNMF can return any or all of the following:
- The bases of the components in the form of a magnitude spectrum for each.
- The activations of the components in the form of an amplitude envelope for each.
- An audio reconstruction of each component.
Some additional options and flexibility can be found through combinations of the
If either or both of these arguments are set to 1, BufNMF expects to be supplied with the corresponding elements (activations or bases) to use as seeds for the decomposition, providing more guided results. It is possible to set both arguments equal to 1.
If either of these arguments are set to 2, BufNMF will not modify the supplied corresponding elements (activations or bases), instead using them as templates to match against. BufNMF will modify the other elements which will represent their best match to the supplied spectrogram, given the fixed elements provided. Note that having both basesMode and actMode set to 2 doesn’t make sense.
For more information on using
actMode, see the Seeding NMF Overview.
If supplying pre-formed data (for
basesMode 1 and 2), it’s up to the user to make sure that the supplied buffers are the right size: bases must be (fft size / 2) + 1 frames and components * input channels channels
activations must be (input frames / hopSize) + 1 frames and components * input channels channels. FFT settings can be set in the BufNMF object.
Learning the Parts of Objects by Non-Negative Matrix Factorization
Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91.https://doi.org/10.1038/44565
Non-Negative Matrix Factorization for Polyphonic Music Transcription
Smaragdis and Brownhttps://ieeexplore.ieee.org/document/1285860