Recall :
Recall is a metric used to evaluate the performance of a machine learning model, specifically in classification tasks. It is defined as the number of true positive predictions made by the model, divided by the total number of positive instances in the test set. In other words, it measures the proportion of actual positive examples that the model was able to correctly identify.
For example, consider a binary classification task in which the model is trying to predict whether a given email is spam or not. True positive predictions refer to emails that the model correctly identified as spam, while false negatives refer to emails that the model incorrectly identified as not spam (i.e., missed spam emails).
To understand recall better, let’s consider the following two examples:
Example 1:
Imagine that the model is trying to predict whether a given email is spam or not, and the test set contains 100 emails, of which 20 are spam and 80 are not spam. The model makes the following predictions:
True positives: 15 (emails correctly identified as spam)
False negatives: 5 (emails missed as spam)
True negatives: 80 (emails correctly identified as not spam)
False positives: 0 (emails incorrectly identified as spam)
The recall for this model can be calculated as follows:
Recall = (True positives) / (True positives + False negatives)
= 15 / (15 + 5)
= 75%
This means that the model was able to correctly identify 75% of the spam emails in the test set.
Example 2:
Now consider a different example in which the model is trying to predict whether a given image contains a cat or not. The test set contains 1000 images, of which 200 contain a cat and 800 do not. The model makes the following predictions:
True positives: 180 (images correctly identified as containing a cat)
False negatives: 20 (images missed as containing a cat)
True negatives: 800 (images correctly identified as not containing a cat)
False positives: 0 (images incorrectly identified as containing a cat)
The recall for this model can be calculated as follows:
Recall = (True positives) / (True positives + False negatives)
= 180 / (180 + 20)
= 90%
This means that the model was able to correctly identify 90% of the images containing a cat in the test set.
In summary, recall is a measure of a model’s ability to correctly identify all positive instances in the test set. It is an important metric to consider, especially in cases where it is important to minimize false negatives (e.g., in spam filters or medical diagnosis). However, it is important to note that a high recall may come at the cost of a higher number of false positives, and it is often necessary to balance recall with other evaluation metrics such as precision and accuracy.