Learn features from data

SKMeans is an object that facilitates learning features from data. It is very similar to KMeans but with 2 distinctions.

  1. Instead of taking the Euclidian distance to each center of each cluster like KMeans does, SKMeans takes the angular distance of the normalised vector in high dimension. This is also named cosine similarity.

  2. Once the cluster are found, the distance are encoded with an ‘alpha’ function, in effect promoting a sparser representations where smaller similarities are penalised.

In effect, SKMeans is mostly used to learn features in a higher dimension space than the original data, with the assumption that it would help untangle near clusters.

Spherical KMeans @Machine Learning Catalogue

A terse definition of the algorithm.
Classic vs Spherical KMeans

A quite thorough explanation of the difference between "classic" and spherical KMeans.
Coates and Ng - Learning Feature Representations with K-means

The original paper describing the implementation of encoded activations in feature learning.
Last modified: Thu Jun 16 15 by James Bradbury
Edit File on GitHub