Validation Set
The validation set is there to be used to train AI, with the goal of making it smart enough to learn from past mistakes and solve any problems that may come up in the future. This dataset is used to give an estimate of model skill while tuning the model’s hyperparameters. In other words, a set of examples is used to calibrate a classifier’s parameters, such as the number of hidden units in a neural network. This way, this dataset calculates and concludes whether a model is under- or overfit for the task it is designed for. This dataset, which helps set the score of a model’s competency, proves vital for engineers to extract the accuracy of the output given by it.
Testing Set
This dataset comes at the bottom of the flow of training. It is used at the end of the training process to calculate a model’s success based on how well it performed. The engineers and professionals only use this dataset at the end of the training phase because the model cannot have access to it until after the training phase is over. There is much confusion in applied machine learning about how the testing set differs from the validation set, mainly because both subsets are used to assess the parameters of the model. In spite of this similarity, the testing set is used to measure how well a fully specified classifier works, while the validation set is used to fine-tune a classifier’s parameters.
NoteApproximately 60% of the total dataset is made up of the training dataset. The validation dataset accounts for somewhere in the neighborhood of 20% of the total dataset. The amount of data used for testing accounts for 20% of the total amount.