Bagplot
- Visualizes a multivariate dataset as a scatterplot with a central “bag” containing 50% of the points.
- Points outside the bag form a “fence” and can be considered potential outliers.
- The bag boundaries are commonly computed from medians and median absolute deviations (MAD), using a factor of 1.5.
Definition
Section titled “Definition”A bagplot is a scatterplot-based graphical tool for displaying the distribution of a multivariate dataset. Developed by Rousseeuw and Van Zomeren as an extension of the classical boxplot (for univariate data), it draws two lines around the points to indicate a “bag” that includes 50% of the points; the remaining points outside those lines form the “fence” of potential outliers.
Explanation
Section titled “Explanation”- Start with a scatterplot of the multivariate (typically two-dimensional) data.
- Compute the median and the median absolute deviation (MAD) of each coordinate.
- Define lower and upper bag boundaries using the median plus or minus 1.5 times the MAD:
- The region enclosed by these boundaries is the bag containing roughly 50% of the data; points outside that region form the fence and are possible outliers. The bagplot highlights central tendency, spread, and potential multivariate outliers.
Examples
Section titled “Examples”Dataset example (1,1), (1,2), (1,3), (1,4), (1,5), (2,2), (2,3), (2,4), (2,5), (3,3), (3,4), (3,5), (4,4), (4,5), (5,5)
- Draw a scatterplot of the points:
[scatterplot]
- Compute medians and MADs of coordinates. For this dataset:
- x-coordinates: median = 2, MAD = 1
- y-coordinates: median = 4, MAD = 1
-
Define the bag boundaries using median ± 1.5 × MAD, then compute the numeric lines as shown in the source:
-
Draw these lines on the scatterplot to produce the bagplot:
[bagplot]
Points outside the bag (indicated in the original example by red dots) are possible outliers.
Use cases
Section titled “Use cases”- Quality control
- Statistical analysis
- Datasets with many points and/or high dimensions where univariate boxplots are not effective
Notes or pitfalls
Section titled “Notes or pitfalls”- The bagplot construction relies on arbitrary constants (for example, the factor of 1.5 used to compute the lines), which may not be appropriate for all datasets.
- Alternative methods, such as the minimum covariance determinant estimator, have been proposed to address this issue.
Related terms
Section titled “Related terms”- Boxplot
- Median
- Median absolute deviation (MAD)
- Outlier
- Fence
- Minimum covariance determinant estimator
- Rousseeuw and Van Zomeren