Akaike Information Criterion

TL;DR

AIC balances model fit and complexity by combining RSS with a penalty based on parameter count.
Lower AIC indicates a preferred model when comparing alternatives.
Commonly used for model selection (choose the model with the smallest AIC).

Definition

Akaike information criterion, also known as AIC, is a statistical measure used to evaluate the quality of a model by comparing the goodness of fit of the model with the number of parameters in the model. The AIC is calculated by adding the residual sum of squares (RSS) and twice the number of parameters in the model. The AIC is often used in model selection, where the goal is to choose the model with the smallest AIC value.

Explanation

AIC quantifies a trade-off between how well a model fits the observed data (measured here by RSS) and the model complexity (measured by the number of parameters). For two competing models, compute each model’s RSS, add twice its parameter count, and select the model with the smaller AIC value. This procedure favors models that achieve good fit with fewer parameters.

Examples

Linear regression example

Suppose a dataset has two predictor variables, x1 and x2, and one response variable, y. Fit two linear regression models:

Model with only x1: $y = \beta_0 + \beta_1 x_1$
Model with x1 and x2: $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$

Compute the residual sum of squares (RSS) for each model by summing the squared differences between observed and predicted y:

$\text{RSS}_1 = \sum (y - \hat{y})^2$

$\text{RSS}_2 = \sum (y - \hat{y})^2$

Then compute AIC for each model by adding RSS and twice the number of parameters in the model:

$\text{AIC}_1 = \text{RSS}_1 + 2(1)$

where the 1 in the parentheses represents the number of parameters in the model (i.e., \beta_0 and \beta_1).

$\text{AIC}_2 = \text{RSS}_2 + 2(2)$

where the 2 in the parentheses represents the number of parameters in the model (i.e., \beta_0, \beta_1, and \beta_2).

Choose the model with the smaller AIC value. In this example, the model with only x1 as a predictor is chosen because it has a smaller AIC value than the model with both x1 and x2 as predictors.

Use cases

Model selection: choose among competing models by selecting the one with the smallest AIC.

Residual sum of squares (RSS)