Padhai Time

Confusion Matrix

Introduction to Confusion Matrix :

For binary classification problems, a confusion matrix is a summary table showing the number of accurate and wrong guesses (or actual and anticipated values) generated by a classifier (or classification model).

In simple words,

The ways in which a classification model becomes confused when making predictions are shown in a confusion matrix.
Large values across the diagonal and lower values off the diagonal describe a good matrix (model).
Measuring a confusion matrix gives us a better idea of whether our classification model is right and what kinds of mistakes it is causing.

True Positive, True Negative, False Positive and False Negative :

This is a table with four different sets of predicted and actual values.
The table compares predicted values in Positive and Negative and actual values as True and False.
These four variables form the base for creating a confusion matrix.

True Positive(TP) - Number of correctly labelled positive samples
False Positive(FP) - Number of negative samples incorrectly labelled as positive
True Negative(TN) - Number of correctly labelled negative samples
False Negative(FN) - Number of positive samples incorrectly labelled as negative

Now, let’s understand the classification and confusion matrix concept in terms of True vs False and Positive vs Negative with a simple example.

Making definition (An example of cricket) :

The batsman is NOT OUT, a positive class or logic 1.
The batsman is OUT, a negative class or logic 0.

Now in terms with the 2x2 confusion matrix;

True positive: An umpire gives a batsman NOT OUT when he is actually NOT OUT.
True Negative: When an umpire gives a batsman OUT when he is actually OUT.
False Positive (Type 1 error): This is the condition a batman is given NOT OUT when he is actually OUT.
False Negative (Type 2 error): When an umpire gives a batman OUT when he is actually NOT OUT.

Benefits of Confusion Matrix :

It provides information on the sorts of errors produced by the classifier as well as the errors themselves.
This feature assists in prevailing over the limitations of deploying classification accuracy alone.
It is used in situations where there is a significant imbalance in the classification issue, with one class dominating the others.
Recall, Precision, Specificity, Accuracy, and the AUC-ROC Curve can all be calculated using the confusion matrix. (We will covering those topics in the next articles).

Bengaluru, India

contact.padhaitime@gmail.com