Skip to content

Outlier

  • A single observation that differs markedly from the rest and can distort statistical summaries and inference.
  • Common causes are errors in measurement, extreme values, or unusual occurrences.
  • Identify outliers by plotting the data or locating points beyond two standard deviations from the mean; include or exclude them based on their cause and impact.

An outlier is a data point that is significantly different from the other data points in a dataset. It may arise from errors in measurement, extreme values, or simply being an unusual occurrence, and it can substantially affect the results of statistical analyses.

Outliers stand apart from the bulk of observations and therefore can change summary statistics and the outcome of analytical methods. They can be caused by:

  • errors in measurement,
  • genuinely extreme values within the population, or
  • rare or unusual occurrences.

Because outliers can cause statistical techniques to produce inaccurate or misleading results, it is important to identify them and assess whether they should be included in or excluded from an analysis. Common identification methods described here are visual inspection (plotting the data) and checking for points that fall outside of two standard deviations from the mean (the “two standard deviation rule”). The choice to include an outlier should be based on the reason for its appearance and its potential impact on results: exclude if caused by measurement error or an aberration, include if it is a representative extreme value of the population.

If measuring heights, accidentally recording one person’s height as 6 feet instead of 5 feet 6 inches produces a data point that is an outlier because it is significantly different from the other measurements and could skew results.

When measuring incomes, a participant who is a successful business owner and makes significantly more money than other participants is an outlier because their income is markedly different and can affect the calculated average.

  • Statistical analyses where summary statistics (for example, averages) are computed; outliers can skew means and other measures.
  • Studies estimating population characteristics (examples given: average height, average income) where identifying and handling outliers affects the final estimates.
  • Outliers can cause statistical techniques to produce inaccurate or misleading results and can change the interpretation of data.
  • Identification methods mentioned: visual inspection (plotting) and the “two standard deviation rule” (points outside of two standard deviations from the mean).
  • Decisions to include or exclude outliers should be based on the cause of the outlier and its potential impact on analysis results.
  • mean
  • standard deviation
  • two standard deviation rule
  • errors in measurement
  • extreme values