Regression Analysis
- Used to model and predict a dependent variable from one or more independent variables.
- Common variants include simple linear, multiple linear, and logistic regression.
- Relies on assumptions (linearity, homoscedasticity, normality); violations can undermine reliability.
Definition
Section titled “Definition”Regression analysis is a statistical technique used to examine the relationship between two or more variables and to predict the value of one variable (the dependent variable) based on the value of another variable (the independent variable).
Explanation
Section titled “Explanation”Regression analysis fits a model to observed data to quantify the strength and direction of relationships among variables. The fitted model is typically used to predict the dependent variable from the independent variable(s). The strength of the relationship is measured by the coefficient of determination (R2), which ranges from 0 to 1; a value of 0 indicates the model explains none of the variance in the dependent variable, while a value of 1 indicates the model explains all of the variance.
Model fitting is commonly performed with statistical software packages such as SPSS or R. The difference between predicted values and observed values are called residuals. Several assumptions underlie reliable regression results, including linearity (a linear relationship between dependent and independent variables), homoscedasticity (constant variance of residuals across values of the independent variable), and normality (residuals are normally distributed). If these assumptions are not met, regression results may not be reliable.
Examples
Section titled “Examples”Employee hours and productivity
Section titled “Employee hours and productivity”A company collects data on hours worked by each employee and their corresponding productivity levels. Using regression analysis, the company can determine the strength and direction of the relationship between hours worked and productivity. A positive relationship means productivity increases as hours worked increases; a negative relationship means productivity decreases as hours worked increases.
Inflation and unemployment
Section titled “Inflation and unemployment”In economics, regression is used to predict the relationship between inflation (independent variable) and unemployment (dependent variable). A positive relationship means unemployment increases as inflation increases; a negative relationship means unemployment decreases as inflation increases.
Use cases
Section titled “Use cases”- Economics
- Finance
- Marketing
- Psychology
Notes or pitfalls
Section titled “Notes or pitfalls”- Assumptions required for reliable regression: linearity, homoscedasticity, and normality.
- Residuals are the differences between predicted and observed values.
- R2 ranges from 0 to 1; 0 means no explained variance, 1 means all variance explained.
- If assumptions are violated, regression results may not be reliable.
Related terms
Section titled “Related terms”- Simple linear regression
- Multiple linear regression
- Logistic regression (used when the dependent variable is binary, such as 0 or 1)
- Coefficient of determination (R2)
- Residuals
- SPSS
- R