Confusion Matrix
Introduction to Confusion Matrix :
For binary classification problems, a confusion matrix is a summary table showing the number of accurate and wrong guesses (or actual and anticipated values) generated by a classifier (or classification model).
In simple words,
- The ways in which a classification model becomes confused when making predictions are shown in a confusion matrix.
- Large values across the diagonal and lower values off the diagonal describe a good matrix (model).
- Measuring a confusion matrix gives us a better idea of whether our classification model is right and what kinds of mistakes it is causing.
True Positive, True Negative, False Positive and False Negative :
- This is a table with four different sets of predicted and actual values.
- The table compares predicted values in Positive and Negative and actual values as True and False.
- These four variables form the base for creating a confusion matrix.
- True Positive(TP) - Number of correctly labelled positive samples
- False Positive(FP) - Number of negative samples incorrectly labelled as positive
- True Negative(TN) - Number of correctly labelled negative samples
- False Negative(FN) - Number of positive samples incorrectly labelled as negative
Now, let’s understand the classification and confusion matrix concept in terms of True vs False and Positive vs Negative with a simple example.
Making definition (An example of cricket) :
- The batsman is NOT OUT, a positive class or logic 1.
- The batsman is OUT, a negative class or logic 0.
Now in terms with the 2x2 confusion matrix;
- True positive: An umpire gives a batsman NOT OUT when he is actually NOT OUT.
- True Negative: When an umpire gives a batsman OUT when he is actually OUT.
- False Positive (Type 1 error): This is the condition a batman is given NOT OUT when he is actually OUT.
- False Negative (Type 2 error): When an umpire gives a batman OUT when he is actually NOT OUT.
Benefits of Confusion Matrix :
- It provides information on the sorts of errors produced by the classifier as well as the errors themselves.
- This feature assists in prevailing over the limitations of deploying classification accuracy alone.
- It is used in situations where there is a significant imbalance in the classification issue, with one class dominating the others.
- Recall, Precision, Specificity, Accuracy, and the AUC-ROC Curve can all be calculated using the confusion matrix. (We will covering those topics in the next articles).