High Breakdown Methods
- Methods in robust statistics designed to resist the influence of outliers.
- Can tolerate a relatively large fraction of contaminated observations before performance degrades.
- Examples include the median absolute deviation (MAD) and the Tukey biweight.
Definition
Section titled “Definition”In robust statistics, high breakdown methods are statistical methods that have a high breakdown point, which is the maximum fraction of outliers that the method can handle before it becomes substantially less effective.
Explanation
Section titled “Explanation”High breakdown methods are resistant to the effects of outliers—data points that are significantly different from the majority. Because they are built around estimators or weighting schemes less influenced by extreme values, they remain reliable when a substantial portion of the data is contaminated. Traditional estimators such as the mean and standard deviation are sensitive to outliers, whereas high breakdown methods reduce that sensitivity.
Examples
Section titled “Examples”Median absolute deviation (MAD)
Section titled “Median absolute deviation (MAD)”- The MAD measures dispersion using the median rather than the mean, making it more robust to outliers.
- Calculation steps:
- Find the median of the data.
- For each data point, compute the absolute difference between the data point and the median.
- Take the median of these absolute differences — this is the MAD.
- Because MAD is based on the median, it is less affected by outliers than the standard deviation.
Tukey biweight
Section titled “Tukey biweight”- The Tukey biweight estimates location and scale using a weighting function that downweights observations further from the center.
- Calculation steps:
- Estimate the center of the data (for example, using the median or an M-estimator).
- For each data point, compute the difference between the data point and the center.
- Apply the Tukey weighting function to those differences to obtain weighted differences.
- Sum the weighted differences and divide by the sum of the weights to obtain the Tukey biweight.
- Because outliers receive smaller weights, the Tukey biweight is less affected by outliers than the mean and standard deviation.
Use cases
Section titled “Use cases”- Use when the dataset may contain some outliers and robustness to those outliers is required.
- Preferable to traditional estimators (mean, standard deviation) in contaminated data settings.
Notes or pitfalls
Section titled “Notes or pitfalls”- The breakdown point is the maximum fraction of outliers a method can handle; beyond that fraction the method becomes substantially less effective.
- Traditional estimators like the mean and standard deviation are sensitive to outliers and thus have lower breakdown points compared to high breakdown methods.
Related terms
Section titled “Related terms”- Robust statistics
- Breakdown point
- Median absolute deviation (MAD)
- Tukey biweight
- Median
- Mean
- Standard deviation
- M-estimator