Precision-Recall Tradeoff - Yousef's Notes

We have to choose a balance between Precision and Recall because we can’t have both high. Increasing one decreases the other.

High precision: model is conservative in predicting positives → high False Negatives as it misses some positives.
High recall: model is liberal in predicting positives → high False Positives as it misses some negatives.

#Tuning the tradeoff

We can assign a higher weight to the examples of a specific class. e.g. SVM in scikit-learn accepts weights of classes as input.
By tuning Hyperparameters to maximize either Precision or Recall on the Validation Dataset.
By varying the decision threshold for algorithms that return prediction scores.

#Example

We have a logistic regression model or a decision tree. We can increase precision at the cost of a lower recall: we decide that the prediction will be positive only if the score returned by the model is higher than 0.9 (instead of the default value of 0.5)

#Single Metric

To compare two or more models, we want a single value that determines the precision-recall tradeoff.

Optimizing and satisficing technique
- Depending on the problem, choose either precision or recall and fix one metric to a threshold. e.g. spam classification: choose precision, fix FN threshold to 2%.
- Generalization: threshold n-1 metrics and optimize the nth.
F-score
Simple average, or weighted average of metrics.
Invent our own domain-specific metric.