Testing and Evaluating a ML Model

In order to measure performance of any ML model it needs to be tested and evaluated. The dataset must be divided into training and test set. Model is trained using training data and tested using test data.

Its not that, you trained a model on whole dataset you have and directly deploy via the software systems. The model must be validated on training set and finally evaluated on test set. The final model should be able to generalize well on unseen data without overfitting training set.

Validation

During training stage, several models are trained with different set of hyper-parameters. Each model is validated during training. Validation data is allocated from the training data.

Cross-validation is the technique used for validation which automatically takes care of dividing training set into number of training and test pairs with in the training set. When the best model which outperforms other models is selected, it is finally tested on test data.

Cross Validation (Credit: Scikit-learn)

Testing

The model with best validation score is evaluated on test data. Test data should be:
- representative of data set as a whole
- unseen to the training model
- large enough to yield meaningful results.

Note: Never train on test data.

The model is trained on whole training set with best hyper-parameter values and evaluated on test data. The model has better performance when it can generalize well on unseen data. The model should not overfit the training data nor underfit the training data.

Performance Measures

The ML model is evaluated using various performance measures instead of just making assumptions based on the predictions. According to the needs, different models are evaluated using different metrics. Like, MAE and MSE are popular for regression models whereas for evaluating classification models, Confusion Matrix and Accuracy are widely used.

Most common evaluation metrics are: