KNNClassifier

Classification with K Nearest Neighbours

KNNClassifier is a supervised machine learning algorithm for classifying data points to learned categories. It uses an internal KDTree to find the k nearest neighbours of a point that needs classification (where k is an integer >= 1). Whichever category, or “class”, is most common among the neighbours is predicted as the category for that point. If an even number of numNeighbours is requested and there is a tie, the label with the closer point will be predicted. The parameter weight indicates whether or not the prediction should be weighted by the neighbours’ distances.

Pointer

See the page on KDTree for more on how the nearest neighbour lookup is done.

In order to make predictions, the KNNClassifier must first be fit with a DataSet of input data points paired with a LabelSet of labels for each point in the DataSet (by means of a shared identifier).

Pointer

Whenever training a machine learning model using supervised learning, it may be a good idea to create a Training-Testing Split of the data.

KNNClassifier vs. MLPClassifier

FluCoMa includes another object for classification, the MLPClassifier, which also uses supervised learning for classification. The KNN object works quite differently from the MLP object, each having their strengths and weaknesses. The main differences to know are that:

  1. the flexibility of the MLP objects make them generally more capable of learning complex relationships between inputs and outputs,
  2. the MLP objects involve more parameters and will take much longer to fit (aka. train) than the KNN objects, and
  3. the KNN objects will likely take longer to make predictions than the MLP objects, depending on the size of the dataset (although they’re still quite quick!).