Nonnegative Matrix Factorisation

Nonnegative Matrix Factorisation (NMF) works by trying to take apart a matrix (a two dimensional lump of data), in such a way that it can put it back together again. That's the matrix factorisation part. A crucial part of the trick it uses to do this is relying on the numbers in the matrix being non-negative. This means that it can assume that the way put the matrix back together again is simply by summing the parts it found, without there being any weird cancellation effects between them. Commonly, the matrices we use with NMF for audio are spectrograms, or some other representation of time and frequencies that can be inverted. This allows us to use NMF as a way of trying to find and separate out components in a signal.

NMF as a Vocoder

The way that NMF constructs the parts that it finds is a bit similar to a vocoder. With a vocoder, you have a bunch of band-pass filters, and the amount of signal that goes through each is controlled by an envelope for each filter. NMF produces an envelope along with a spectral shape for each source. We can, if we wish, use these to process other signals, making it a vocoder that discovers its own filter shapes.

A High Level Description

NMF works iteratively, which means that it starts by having a guess, using either random data as suggested components, or some data that we've supplied. It then looks at how far off it was from being able to reconstruct the original using these candidate parts, and adjusts all the candidate data by some amount in proportion to this measure of how wrong it was. It keeps going either for a fixed number of iterations, or until the difference from the original has fallen below some defined threshold.

In the demo below, we can see what happens as NMF iterates. On the left are the original spectrum, our target, and below this is the estimate as it comes together. On the right are the evolving spectra of the components that its found. We've asked for three in this case, but we can ask for any number. Use the slider to step through the iterations (0-49), and note that the difference between the starting point–where the components are just random numbers–and the first iteration is pretty abrupt. Thereafter, the changes are much more subtle, but if you jump through you'll notice how the component spectra become more refined, and the estimated total comes to resemble the original more closely.

Step 0

By using the slider to step through the iterations of a run of NMF on some audio, we can see how the process quite quickly converges on something quite similar to the target spectrum. In this demo we are looking for three components, and the top row shows the estimated spectra for each of these at each step. At the bottom we see the total estimated spectrum overlaid in red on top of the original spectrum in black and white. At step 0, the estimates are just noise, but they develop structure very quickly.

References

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565
Smaragdis, P., Fevotte, C., Mysore, G. J., Mohammadiha, N., & Hoffman, M. (2014). Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view. IEEE Signal Processing Magazine, 31(3), 66–75. https://doi.org/10.1109/MSP.2013.2297715