Training and Testing Data

Determine if a neural network is overfit by reserving some testing data

Training Data and Testing Data

Before training a supervised machine learning algorithm it may be important to separate the input-output example pairs into a training set and testing set. The training set will be used to do the actual training of the neural network and the testing set will be used after the training to test how well the model is performing. This can be important to simulate how the neural network might perform in the actual use-case you’re aiming for. For example, if you train an MLPClassifier to distinguish oboe and trombone timbres so that during a live performance it can know which instrument is sounding, you would train on a training set of audio analyses that have corresponding labels: either “trombone” or “oboe”. Then, to test the effectiveness of the training, you would make predictions using a testing set that has both oboe and trombone examples which were not part of the training set. This simulates the future use-case: having the model make predictions when a trombone and oboe are on stage making sounds, the analyses of which it has never seen before.

One common way of creating training data and testing data is to start with one big dataset of the input-output example pairs. Then, randomly split off a certain portion of the pairs and set them aside as the testing set (depending on the amount of data, 20% might be a good amount to try). This could also be achieved by setting aside some of the sound files (or other source material) you’re planning to use before you do the analysis and make a DataSet. Then after training with this DataSet, analyse and make a new DataSet with the files that were set aside, then test the neural network with the second DataSet.

For some tasks it may not make sense to set aside testing data. If you’re using a neural network to control a synthesiser and you’re training it with only 5 or so points, setting aside any of the points would be eliminating one of the input-output associations you’re hoping it will learn! You’ll want to just use all the points to train the neural network. Creating training and testing data makes sense when you have a lot of data such that when it is randomly split, each portion will likely still maintain the input-output associations you want the neural network to learn.


Using a testing set can help determine if a machine learning model is overfitting. Overfitting means that the model has learned to really accurately make predictions on the data it was trained with, but as soon as it is shown data it has never seen before it performs quite poorly. Another way of stating this is that the model cannot generalise to data it hasn’t seen before. If a model is able to perform well on both the training data and the testing data, it is likely not overfit. If a model is able to perform well on the training data but does poorly on the testing data that was split off, it is probably overfit to the training data. In this case, there are few thing to try to prevent overfitting:

  • If you have enough data such that it makes sense, use a validation set.
  • If using a neural network, try training it to a not-as-low loss value. This may stop it from reaching a state of overfitting.
  • Decrease the number of internal parameters in your neural network by having fewer hidden layers and/or fewer neurons per hidden layer. Neural networks with many internal parameters have the potential to overfit because they are able to essentially “store” the training data’s specific input-output relationships within their many internal parameters. Having fewer internal parameters can prevent this from happening, instead forcing the neural network to learn a more generalised understanding of the input-output relationships. The more generalised understanding will perform better on the testing set (and other data that it hasn’t seen before).


Overfitting can be contrasted with underfitting in which a model also cannot generalise, in this case because it probably has not been trained enough. A model that performs poorly on the testing set, but also performs poorly on the training set is likely underfitted.