Skip to content

Evaluation Metrics

  • Quantitative measures for assessing model or algorithm performance on a dataset.
  • Common examples include accuracy and F1 score.
  • F1 score accounts for false positives and false negatives and can be preferable when classes are imbalanced.

Evaluation metrics are measures used to evaluate the performance of a model or algorithm on a given dataset. They provide a way to assess the effectiveness of a model and can help determine which model is best suited for a given problem.

  • Accuracy is the ratio of correct predictions made by a model to the total number of predictions made. It is simple and intuitive and often used as a measure of a model’s performance.
  • Accuracy can be misleading when class distributions are imbalanced: a model that predicts only the majority class can achieve high accuracy despite making no useful predictions for the minority class.
  • F1 score is the harmonic mean of precision and recall and accounts for both false positives and false negatives.
    • Precision is the ratio of true positive predictions to all positive predictions.
    • Recall is the ratio of true positive predictions to all actual positive examples.
  • The F1 score ranges from 0 to 1, with a higher value indicating better performance.

If a model predicts the correct label for 90 out of 100 examples, its accuracy would be 90%.

If a dataset has 95% of examples belonging to one class and only 5% belonging to the other class, a model that always predicts the majority class would have a high accuracy, even though it is not making any useful predictions.

  • If a model makes 100 positive predictions, but only 80 of them are correct, its precision would be 80%.
  • If there are 100 positive examples in the dataset and the model only correctly predicts 80 of them, its recall would be 80%.
  • Comparing and selecting models by assessing their effectiveness on a dataset.
  • Accuracy does not account for class imbalance and can give misleadingly high values when one class dominates.
  • Accuracy
  • F1 score
  • Precision
  • Recall
  • False positives
  • False negatives