Seeding Bases and Activations with BufNMF
Initialize non-negative matrix factorization with seeded elements.
This overview builds on information found in the BufNMF Overview. Please visit that page first to learn more.
Non-negative matrix factorisation can be “seeded,” meaning that it is given some information to begin the decomposition process from. You can provide bases and/or activations that you think will “point the algorithm” in the direction toward what you’re hoping it will decompose. Without seeding, NMF activations and bases are both initialised with random values.
Providing BufNMF with bases as seed begins the decomposition process from these spectral templates. This can be useful if you know what sort of spectral templates might be in your sound buffer, or what sort of spectral templates you’re hoping to find. You can use bases derived from a BufNMF analysis or bases that are constructed manually. Setting
basesMode = 1 indicates to BufNMF to use the provided bases as seed and modify them as part of the decomposition process. Setting
basesMode = 2 indicates to use the provided bases as seed but to not update them during the decomposition process. (
basesMode = 0 indicates to use randomised initial values.)
In the drum loop heard in the BufNMF Overview, there are three drum instruments: kick drum, snare drum, and hi-hat. In that overview, BufNMF is not seeded and decomposes the audio buffer into three channels: each containing mostly sounds from just one of those instruments. Because the process is not seeded, its bases begin filled with random values and therefore the ordering of these channels are random: each time the process runs, the “kick drum component” could result in any of the channels 1, 2, or 3, as with the other components.
By providing the seeded bases below, which very roughly resemble spectral templates for a kick drum (blue), snare drum (orange), and hi-hat (green), one can be reasonably confident that these components will end up in this order: kick drum component, snare drum component, hi-hat component. Basis 1 has energy only in the low frequency range, hoping that the NMF process will transform it into a basis representing the kick drum. Basis 2 has energy only in a middle register, hoping that it will become the basis representing the snare drum. Basis 3 has energy across a lot of the spectrum, mostly in the upper frequency range where the cymbals sound, hoping it will come to represent the cymbals. (Because bases are FFT spectra, the right half represents only one octave of the spectrum. The terms “middle register” and “upper frequency” are used here based on human hearing, rather than FFT bin position). Because we want BufNMF to use these bases as the starting point and transform them during the decomposition process to more closely resemble the spectra in the buffer, we’ll set
basesMode = 1.
These bases have been created manually by calclutating each positon in the basis to have either a 0, 1, or some number in between. Download the code below to see how these values were created.
Seeding the BufNMF analysis with these bases allows us to be fairly confident that the resynthesised components will be in the order we expect: (1) kick drum, (2) snare drum, and (3) hi-hat.
Rather than using the “hand-made” bases as above, one could also uses bases derived from a BufNMF analysis. For example if we want to decompose a piano melody and we know generally what octave and key it will have, we could use BufNMF to create bases for the 8 different notes in that scale. Here is a major scale staring on middle C created using sawtooth waves.
Performing BufNMF on this audio provides the following 8 activations and 8 bases:
One can see how the activations clearly delineate the different notes in time and the harmonic structure of the sawtooth waves are represented in the bases. It’s interesting to notice how the activation for the C at the top of the scale (the purple activation) indicates some presence of the C at the bottom of the scale (the activation for which is green). This is because the harmonic structures of these two notes (one octave apart) are similar enough to make decomposing them difficult for BufNMF.
Using these bases as a seed to decompose the following melody gives BufNMF a strict idea of what to be looking for in the source buffer. For this example we’ll set
basesMode = 2 so that BufNMF uses the provided bases as seed, but won’t alter them in the process of decomposition. This is because we want to find how present and where these notes are in the melody, rather than having BufNMF take these bases as a starting point and transforming them to be more like the spectral content in the melody.
Below is the resulting resynthesised buffer containing the 8 decomposed components (seeded by the 8 bases above) with their corresponding activations overlaid. Because the melody does not use all of the notes represented by the bases, some components have little to no amplitude or activation energy to them. The bases that represent notes not used in the melody will not be activated by the melody.
BufNMF has pretty well identified which notes occur and when in this melody. One could use this information to determine what notes are most represented in an audio buffer, perhaps for determining key. Again, one can see that the similarity in harmonic structure of the two C notes an octave apart makes decomposing these parts of the buffer difficult for BufNMF. This process isn’t limited to pitch-based applications. If the seeded bases were created using different sounds, BufNMF could determine where and when those sounds occur in a buffer.
One can also seed a process using activations if it is already known where in time certain components occur. Take for example the BufNMF results of decomposing the sawtooth waves above. BufNMF is able to quite well decompose the different notes in the scale, however, because it is seeded randomly, we have no control over which note ends up being represented by which base. By seeding the activations with energy in the order that we want the bases to result in, we can be quite confident that they will end up this way. Below is a plot of the activations used to seed a BufNMF analysis. Note that it has 8 channels and energy that occurs in series according to the channel number. This corresponds to the order in which the sawtooth notes appear in the source buffer. Also, for seeding, this activation buffer must be
[(number of frames in source / hop size) + 1] frames long, and have
number of components * number of channels channels wide.
actMode = 1 indicates for BufNMF to use the provided activations as seed and update them during the decomposition process.
actMode = 2 indicates to use the provided activations as seed but not modify them during the decomposition process. For this example We’ll use
actMode = 1 because we want BufNMF to alter our seeded activations slightly to more accurately adjust to where the notes occur in time. The resulting activations can be seen below (and below that the bases). We can now be confident that when we use these bases to seed an analysis (such as of the piano melody above) we can be confident which order the bases (and therefore the notes that they represent) are in. Also, notice that BufNMF has more clearly separated the two C notes (an octave apart) because the seeded activations suggested that that they are two different components.
Here are the resynthesised components of the piano melody using the ordered bases above (with their corresponding activations overlaid).