Influence
- Measures how much a single observation can change a regression model’s coefficients.
- Large influence often signals problems such as outliers or collinearity.
- Identifying influential points helps improve model accuracy and reliability.
Definition
Section titled “Definition”Influence in regression analysis refers to the extent to which an individual data point has an effect on the overall regression model; specifically, it measures how much a single data point can change the coefficients of the regression model.
Explanation
Section titled “Explanation”Influence quantifies the impact of an individual observation on the fitted regression model. Observations with high influence can disproportionately alter coefficient estimates, which may indicate issues in the data or model. Two common causes of high influence described here are outliers—observations that differ markedly from the bulk of the data—and collinearity—high correlation among predictor variables that makes it difficult to separate their individual contributions.
Examples
Section titled “Examples”Outlier example
Section titled “Outlier example”Outliers are data points that are significantly different from the majority of the data. In a regression model, outliers can have a disproportionate influence on the coefficients because they are so different from the other data points. For instance, imagine a regression model that is trying to predict the price of a house based on its size and number of bedrooms. If one data point is a house that is 10 times larger than all the other houses in the dataset, it may have a large influence on the model and cause the coefficients to be significantly different than they would be without the outlier.
Collinearity example
Section titled “Collinearity example”Collinearity occurs when two or more predictor variables are highly correlated. In a regression model, collinearity can cause individual data points to have a large influence because the model is unable to accurately determine the contribution of each predictor variable. For instance, imagine a regression model that is trying to predict the price of a car based on its horsepower and weight. If horsepower and weight are highly correlated, the model may be unable to accurately determine the contribution of each predictor variable to the prediction. This can cause individual data points to have a large influence on the model and cause the coefficients to be significantly different than they would be without collinearity.
Notes or pitfalls
Section titled “Notes or pitfalls”- High levels of influence can indicate potential problems with the model, such as outliers or collinearity.
- By identifying and addressing outliers and collinearity, researchers can improve the accuracy and reliability of their regression models.
Related terms
Section titled “Related terms”- Outliers
- Collinearity
- Coefficients
- Predictor variables
- Regression model