Outlier
- A single observation that differs markedly from the rest and can distort statistical summaries and inference.
- Common causes are errors in measurement, extreme values, or unusual occurrences.
- Identify outliers by plotting the data or locating points beyond two standard deviations from the mean; include or exclude them based on their cause and impact.
Definition
Section titled “Definition”An outlier is a data point that is significantly different from the other data points in a dataset. It may arise from errors in measurement, extreme values, or simply being an unusual occurrence, and it can substantially affect the results of statistical analyses.
Explanation
Section titled “Explanation”Outliers stand apart from the bulk of observations and therefore can change summary statistics and the outcome of analytical methods. They can be caused by:
- errors in measurement,
- genuinely extreme values within the population, or
- rare or unusual occurrences.
Because outliers can cause statistical techniques to produce inaccurate or misleading results, it is important to identify them and assess whether they should be included in or excluded from an analysis. Common identification methods described here are visual inspection (plotting the data) and checking for points that fall outside of two standard deviations from the mean (the “two standard deviation rule”). The choice to include an outlier should be based on the reason for its appearance and its potential impact on results: exclude if caused by measurement error or an aberration, include if it is a representative extreme value of the population.
Examples
Section titled “Examples”Errors in measurement
Section titled “Errors in measurement”If measuring heights, accidentally recording one person’s height as 6 feet instead of 5 feet 6 inches produces a data point that is an outlier because it is significantly different from the other measurements and could skew results.
Extreme values
Section titled “Extreme values”When measuring incomes, a participant who is a successful business owner and makes significantly more money than other participants is an outlier because their income is markedly different and can affect the calculated average.
Use cases
Section titled “Use cases”- Statistical analyses where summary statistics (for example, averages) are computed; outliers can skew means and other measures.
- Studies estimating population characteristics (examples given: average height, average income) where identifying and handling outliers affects the final estimates.
Notes or pitfalls
Section titled “Notes or pitfalls”- Outliers can cause statistical techniques to produce inaccurate or misleading results and can change the interpretation of data.
- Identification methods mentioned: visual inspection (plotting) and the “two standard deviation rule” (points outside of two standard deviations from the mean).
- Decisions to include or exclude outliers should be based on the cause of the outlier and its potential impact on analysis results.
Related terms
Section titled “Related terms”- mean
- standard deviation
- two standard deviation rule
- errors in measurement
- extreme values