Skip to content

Introduction to Machine Learning - Model Evaluation Metrics

References and Acknowledgments


Different machine learning tasks require different metrics for evaluation. This article is based on scikit-learn and introduces the evaluation metrics for regression, classification, and clustering models through practical code.

Evaluation Metrics for Regression Models

For regression models, the goal is to make the predicted values fit the actual values as closely as possible. Common performance evaluation metrics include Mean Absolute Error (MAE) and Mean Squared Error (MSE).

Mean Absolute Error (MAE)

It measures the average absolute difference between predicted and true values in regression problems. A smaller MAE indicates a lower average difference between predicted and true values, meaning higher prediction accuracy.

\[ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_{\text{true}, i} - y_{\text{pred}, i}| \]

Here, \(n\) is the number of samples, \(y_\text{true}\) represents true values, and \(y_\text{pred}\) denotes predicted values.

>>> from sklearn.metrics import mean_absolute_error

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_absolute_error(y_true, y_pred)
0.5

>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> mean_absolute_error(y_true, y_pred)
0.75

Mean Squared Error (MSE)

It calculates the average squared difference between predicted and true values. A smaller value indicates a better fit between predicted and true values.

\[ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{true}, i} - y_{\text{pred}, i})^2 \]

In this formula, \(n\) indicates the number of samples, \(y_\text{true}\) stands for true values, and \(y_\text{pred}\) is for predicted values.

>>> from sklearn.metrics import mean_squared_error

>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_squared_error(y_true, y_pred)
0.375

>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> mean_squared_error(y_true, y_pred)
0.708...

Evaluation Metrics for Classification Models

There are multiple evaluation metrics for classification models, and sometimes these metrics may even conflict with each other.

This post is translated using ChatGPT, please feedback if any omissions.