The confusion matrix is represented by a matrix which each row represents the instances in a predicted class, while each column represents in an actual class. One of the advantages of using this performance evaluation tool is that the data mining analyzer can easily see if the model is confusing two classes (i.e. commonly mislabeling one as another).
The matrix also shows the accuracy of the classifier as the percentage of correctly classified patterns in a given class divided by the total number of patterns in that class. The overall (average) accuracy of the classifier is also evaluated by using the confusion matrix.
|Confusion Matrix on spam classification model|
- False Positive (FP): Falsely Predicting a label (or saying that Non-Spam is a Spam).
- False Negative (FN): Missing and incoming label (or saying a Spam is Non-Spam).
- True Positive (TP): Correctly predicting a label (or saying a Spam is Spam).
- True Negative (TN): Correctly predicting the other label (or saying Non-Spam is Non-Spam).
Looking at the confusion matrix in a general view is as follows:
How can we use those metrics ? For instance, let's consider the previous model now for predicting if a text message have positive or negative opinion associated (common in sentiment analysis task). We have a data set with 10.000 text messages where the model correctly predicts 9.700 negative messages, and 100 positive messages. The model still incorrectly predicts 150 messages which are positive to be negative, and 50 messages which are negative to be positive. The resulting Confusion Matrix is shown below.
|Confusion Matrix on Sentiment classification task|
- TP / (TP+FN)
- TN / (TN+FP)
- Sensitivity = TP / (TP+FN) = 100/(100+50) = 0.4 = 40%
- Specificity = TN / (TN+FP) = 9700/(9700+150) = 0.98 = 98%
As you can see, if we have a test for sentiment classification with 40% sensitivity and 98% specificity, and we have to check 1000 messages, and 500 of them are positive and 500 are negative. You are likely to get about 200 messages true positives, 300 messages false negatives, 490 true negatives and 10 false positives. You can conclude that the the negative prediction is more confident, specially based on the high value of specificity and the low level of sensitivity. As you can see it's a important metric for analyzing the performance of your classifier only looking both separated.