Slicing by Novelty

Novelty in a signal provides a broad concept for thinking about how we might be able say this bit of a sound is different from that bit. It's useful for slicing when we want a more general basis for distinguishing between chunks than looking for the start of well defined events with onsets, transients or changes in the envelope. It can be especially useful when we're interested in making longer slices than you might get with these typically more finely-grained methods.

There could be many ways of trying to come up with such a measure. The one that we use in the Fluid Decomposition Toolkit works by constructing a map of how different each chunk of a signal is to every other chunk. To do this, we transform the sound into the spectral domain using an STFT, meaning that each chunk is can now represented by the magnitudes in each bin. How similar each chunk is to another can then be estimated using a distance measure, and we end up with a grid that maps the difference between each point in time to every other point in time.

What we're interested in is how much 'novelty' appears to be present from one moment to the next. We can find this out from our map by adding together all the differences in a window around a given moment, and making a 'novelty curve'. Then, we can estimate likely places to make slices by looking for peaks in this curve. If we're interested in longer slices, one thing we can do is to make the time window that we sum together larger (this is typically called the kernel size in this type of algorithm). Additionally, we can apply smoothing to the novelty curve to suppresses smaller / shorter peaks and focus instead on larger / longer ones.

This kind of approach is very flexible, because it allows us to tune the lengths of the slices that we're interested in, and to remain relatively ambivalent about the exact properties of the signal that denote novelty (meaning that it may be able to pick up on multiple aspects of what we hear). As such, this kind of technique is quite common in what's called 'structural segmentation' in the Music Information Retrieval field, where the focus is on slicing a relatively long passage, like a complete song or movement, into a number of broad sections.