Classification with K Nearest Neighbours
KNNClassifier is a supervised machine learning algorithm for classifying data points to learned categories. It uses an internal KDTree to find the k nearest neighbours of a point that needs classification (where k is an integer >= 1). Whichever category, or “class”, is most common among the neighbours is predicted as the category for that point. If an even number of
numNeighbours is requested and there is a tie, the label with the closer point will be predicted. The parameter
weight indicates whether or not the prediction should be weighted by the neighbours’ distances.
See the page on KDTree for more on how the nearest neighbour lookup is done.
Whenever training a machine learning model using supervised learning, it may be a good idea to create a Training-Testing Split of the data.
FluCoMa includes another object for classification, the MLPClassifier, which also uses supervised learning for classification. The KNN object works quite differently from the MLP object, each having their strengths and weaknesses. The main differences to know are that:
- the flexibility of the MLP objects make them generally more capable of learning complex relationships between inputs and outputs,
- the MLP objects involve more parameters and will take much longer to
fit(aka. train) than the KNN objects, and
- the KNN objects will likely take longer to make predictions than the MLP objects, depending on the size of the dataset (although they’re still quite quick!).