Evaluation Metrics
- Quantitative measures for assessing model or algorithm performance on a dataset.
- Common examples include accuracy and F1 score.
- F1 score accounts for false positives and false negatives and can be preferable when classes are imbalanced.
Definition
Section titled “Definition”Evaluation metrics are measures used to evaluate the performance of a model or algorithm on a given dataset. They provide a way to assess the effectiveness of a model and can help determine which model is best suited for a given problem.
Explanation
Section titled “Explanation”- Accuracy is the ratio of correct predictions made by a model to the total number of predictions made. It is simple and intuitive and often used as a measure of a model’s performance.
- Accuracy can be misleading when class distributions are imbalanced: a model that predicts only the majority class can achieve high accuracy despite making no useful predictions for the minority class.
- F1 score is the harmonic mean of precision and recall and accounts for both false positives and false negatives.
- Precision is the ratio of true positive predictions to all positive predictions.
- Recall is the ratio of true positive predictions to all actual positive examples.
- The F1 score ranges from 0 to 1, with a higher value indicating better performance.
Examples
Section titled “Examples”Accuracy example
Section titled “Accuracy example”If a model predicts the correct label for 90 out of 100 examples, its accuracy would be 90%.
Class imbalance example
Section titled “Class imbalance example”If a dataset has 95% of examples belonging to one class and only 5% belonging to the other class, a model that always predicts the majority class would have a high accuracy, even though it is not making any useful predictions.
Precision and recall example
Section titled “Precision and recall example”- If a model makes 100 positive predictions, but only 80 of them are correct, its precision would be 80%.
- If there are 100 positive examples in the dataset and the model only correctly predicts 80 of them, its recall would be 80%.
Use cases
Section titled “Use cases”- Comparing and selecting models by assessing their effectiveness on a dataset.
Notes or pitfalls
Section titled “Notes or pitfalls”- Accuracy does not account for class imbalance and can give misleadingly high values when one class dominates.
Related terms
Section titled “Related terms”- Accuracy
- F1 score
- Precision
- Recall
- False positives
- False negatives