Skip to content

Backfitting

  • Iterative fitting method that alternates between variables, updating one set of parameters while keeping the others fixed.
  • Commonly applied when models have many predictors or when predictors exhibit collinearity.
  • Repeat the alternating fits until parameter estimates converge.

Backfitting is a technique used in regression analysis to estimate the parameters of a model by iteratively fitting each variable in the model while holding the remaining variables fixed.

Backfitting proceeds by repeatedly fitting parts of a regression model one at a time while treating the other parts as fixed. Start by fitting a simple model using one predictor to obtain initial parameter estimates. Then hold those estimates fixed and fit the remaining predictors to update their parameters. Alternate which predictor(s) are fit and which are held fixed, iterating this process until the parameter estimates for all predictors converge (i.e., they do not change significantly from one iteration to the next). This approach is often used when the number of predictors is large or when predictors are highly correlated, which can make parameter estimation difficult using traditional simultaneous fitting methods.

Dataset

PriceSizeBedroomsBathrooms
300000100032
325000150042.5
350000200033
400000250053.5

Step 1 — fit size alone:

Price=a+bSize\text{Price} = a + b * \text{Size}

System:

300000=a+b1000325000=a+b1500350000=a+b2000400000=a+b2500\begin{aligned} 300000 &= a + b * 1000 \\ 325000 &= a + b * 1500 \\ 350000 &= a + b * 2000 \\ 400000 &= a + b * 2500 \end{aligned}

Solved:

a=17500b=75a = 17500 \\ b = 75

Initial estimate:

Price=17500+75Size\text{Price} = 17500 + 75 * \text{Size}

Step 2 — hold (b=75) fixed, fit bedrooms and bathrooms:

Price=c+dBedrooms+eBathrooms\text{Price} = c + d * \text{Bedrooms} + e * \text{Bathrooms}

System:

300000=c+d3+e2325000=c+d4+e2.5350000=c+d3+e3400000=c+d5+e3.5\begin{aligned} 300000 &= c + d * 3 + e * 2 \\ 325000 &= c + d * 4 + e * 2.5 \\ 350000 &= c + d * 3 + e * 3 \\ 400000 &= c + d * 5 + e * 3.5 \end{aligned}

Solved:

c=10000d=12500e=2500c = -10000 \\ d = 12500 \\ e = 2500

Updated estimate:

Price=10000+12500Bedrooms+2500Bathrooms\text{Price} = -10000 + 12500 * \text{Bedrooms} + 2500 * \text{Bathrooms}

Step 3 — hold updated estimates of (c) and (d) fixed, fit size:

Price=f+gSize\text{Price} = f + g * \text{Size}

System:

300000=f+g1000325000=f+g1500350000=f+g2000400000=f+g2500\begin{aligned} 300000 &= f + g * 1000 \\ 325000 &= f + g * 1500 \\ 350000 &= f + g * 2000 \\ 400000 &= f + g * 2500 \end{aligned}

Solved:

f=23000g=75f = 23000 \\ g = 75

Updated estimate:

Price=23000+75Size\text{Price} = 23000 + 75 * \text{Size}

Repeat the alternating fitting steps until parameter estimates for all variables have converged.

  • Used when the number of variables in the model is large.
  • Used when there is collinearity among variables, which can hinder parameter estimation by traditional simultaneous methods.
  • Useful for large or complex models to improve the accuracy of parameter estimates and predictions.
  • The process is repeated until the parameter estimates converge (i.e., they do not change significantly between iterations).
  • Regression analysis
  • Collinearity