Learn features from data

SKMeans is an object that facilitates learning features from data. It is very similar to KMeans but with 2 distinctions.

  1. Instead of taking the Euclidian distance to each center of each cluster like KMeans does, SKMeans takes the angular distance of the normalised vector in high dimension. This is also named cosine similarity.

  2. Once the clusters are found, the distances are encoded with an ‘alpha’ function, in effect promoting sparser representations where smaller similarities are penalised.

In effect, SKMeans is mostly used to learn features in a higher dimension space than the original data, with the assumption that it would help untangle near clusters.