Skip to content

Regression Analysis

  • Used to model and predict a dependent variable from one or more independent variables.
  • Common variants include simple linear, multiple linear, and logistic regression.
  • Relies on assumptions (linearity, homoscedasticity, normality); violations can undermine reliability.

Regression analysis is a statistical technique used to examine the relationship between two or more variables and to predict the value of one variable (the dependent variable) based on the value of another variable (the independent variable).

Regression analysis fits a model to observed data to quantify the strength and direction of relationships among variables. The fitted model is typically used to predict the dependent variable from the independent variable(s). The strength of the relationship is measured by the coefficient of determination (R2), which ranges from 0 to 1; a value of 0 indicates the model explains none of the variance in the dependent variable, while a value of 1 indicates the model explains all of the variance.

Model fitting is commonly performed with statistical software packages such as SPSS or R. The difference between predicted values and observed values are called residuals. Several assumptions underlie reliable regression results, including linearity (a linear relationship between dependent and independent variables), homoscedasticity (constant variance of residuals across values of the independent variable), and normality (residuals are normally distributed). If these assumptions are not met, regression results may not be reliable.

A company collects data on hours worked by each employee and their corresponding productivity levels. Using regression analysis, the company can determine the strength and direction of the relationship between hours worked and productivity. A positive relationship means productivity increases as hours worked increases; a negative relationship means productivity decreases as hours worked increases.

In economics, regression is used to predict the relationship between inflation (independent variable) and unemployment (dependent variable). A positive relationship means unemployment increases as inflation increases; a negative relationship means unemployment decreases as inflation increases.

  • Economics
  • Finance
  • Marketing
  • Psychology
  • Assumptions required for reliable regression: linearity, homoscedasticity, and normality.
  • Residuals are the differences between predicted and observed values.
  • R2 ranges from 0 to 1; 0 means no explained variance, 1 means all variance explained.
  • If assumptions are violated, regression results may not be reliable.
  • Simple linear regression
  • Multiple linear regression
  • Logistic regression (used when the dependent variable is binary, such as 0 or 1)
  • Coefficient of determination (R2)
  • Residuals
  • SPSS
  • R