Blog: What’s Recall and Precision?
These are the most widely used model evaluation metrics. Before moving on to these sightly complex metrics let’s see what’s the problem with simpler metric accuracy. Suppose you have a large skewed data set(say 0.5% positive examples). You have build built an amazing ML prediction model with 99% accuracy. Now the question is, “Is this model really good?”. The obvious answer is no. As irrespective of input, if we always predict 0, our model would have 99.5% accuracy.
To overcome this, we need to shift to some other metric. Instead of measuring of total how many we predicted correctly, we can measure out of the total positive, how many we predicted positive and out of the total negative, how many we predicted them as negative. This exactly what precision and recall are.
- True Positive (TP): The actual positive class is predicted positive.
- True Negative (TN): The actual negative class is predicted negative.
- False Positive (FP): The actual class is negative but predicted class is Postive
- False Negative (FN): The actual class is positive but predicted class is negative.
1. Precision: Of all the records we predicted positive, what fraction are actually positive?
2. Recall: Of all the records which are actually positive, what fraction did we correctly predicted as positive?
It is a very common situation where you end up with a model where either Precision is high and Recall is low or vice versa. It becomes a little difficult with two metrics to evaluate the model and say which is better. It would be a lot easier if we had a single value to measure performance, and that metric is F1 score. F1 score is defined as the harmonic mean of Precision and Recall (as the general mean won’t penalize the extreme values).
Trade-Off Precision and Recall
Ability to get high values on Precision and Recall is always desired but, in many real-life situations, it’s really difficult to get that. Depending on the type of application you are developing you might either want to increase Precision or Recall. For example, we are predicting if a patient has cancer or not.
p(x) be the probability that our model thinks the patient has cancer.
if p(x) ≥ 0.5, then we are predicting patient has cancer
else if p(x) < 0.5 then we are predicting patient doesn’t have cancer
if p(x) ≥ 0.7, then we are predicting patient has cancer
else if p(x) < 0.7 then we are predicting patient doesn’t have cancer
In our hospital, we don’t want to panic the patients by giving them a false report saying he/she might have cancer. As after hearing this patient might have to go through a lot of tests to confirm. To avoid this we can increase the threshold to 0.7/0.9 (i.e increase the Precision) by this we’ll be marking only the patient if we are 70% or 90% confident.
if p(x) ≥ 0.3, then we are predicting patient has cancer
else if p(x) < 0.3 then we are predicting patient doesn’t have cancer
On the other hand, if don’t want to miss any of the potential patients who may show signs of cancer, we set our threshold to something like 0.3, essentially increasing the recall. This will help patients to recognize the early stages of cancer and will help in to fight cancer in a better way.
I hope this article gives you a clear understanding of Precision and Recall. Please feel free to comment if I missed something or errored on something and any kind of feedback on the content or the language or the structure of the passage would help me a lot.