Skip to content

Bagplot

  • Visualizes a multivariate dataset as a scatterplot with a central “bag” containing 50% of the points.
  • Points outside the bag form a “fence” and can be considered potential outliers.
  • The bag boundaries are commonly computed from medians and median absolute deviations (MAD), using a factor of 1.5.

A bagplot is a scatterplot-based graphical tool for displaying the distribution of a multivariate dataset. Developed by Rousseeuw and Van Zomeren as an extension of the classical boxplot (for univariate data), it draws two lines around the points to indicate a “bag” that includes 50% of the points; the remaining points outside those lines form the “fence” of potential outliers.

  • Start with a scatterplot of the multivariate (typically two-dimensional) data.
  • Compute the median and the median absolute deviation (MAD) of each coordinate.
  • Define lower and upper bag boundaries using the median plus or minus 1.5 times the MAD: lower line=median1.5×MAD\text{lower line} = \text{median} - 1.5 \times \text{MAD} upper line=median+1.5×MAD\text{upper line} = \text{median} + 1.5 \times \text{MAD}
  • The region enclosed by these boundaries is the bag containing roughly 50% of the data; points outside that region form the fence and are possible outliers. The bagplot highlights central tendency, spread, and potential multivariate outliers.

Dataset example (1,1), (1,2), (1,3), (1,4), (1,5), (2,2), (2,3), (2,4), (2,5), (3,3), (3,4), (3,5), (4,4), (4,5), (5,5)

  1. Draw a scatterplot of the points:

[scatterplot]

  1. Compute medians and MADs of coordinates. For this dataset:
  • x-coordinates: median = 2, MAD = 1
  • y-coordinates: median = 4, MAD = 1
  1. Define the bag boundaries using median ± 1.5 × MAD, then compute the numeric lines as shown in the source: 21.51=0.52-1.51=0.5 2+1.51=3.52+1.51=3.5 41.51=2.54-1.51=2.5 4+1.51=5.54+1.51=5.5

  2. Draw these lines on the scatterplot to produce the bagplot:

[bagplot]

Points outside the bag (indicated in the original example by red dots) are possible outliers.

  • Quality control
  • Statistical analysis
  • Datasets with many points and/or high dimensions where univariate boxplots are not effective
  • The bagplot construction relies on arbitrary constants (for example, the factor of 1.5 used to compute the lines), which may not be appropriate for all datasets.
  • Alternative methods, such as the minimum covariance determinant estimator, have been proposed to address this issue.
  • Boxplot
  • Median
  • Median absolute deviation (MAD)
  • Outlier
  • Fence
  • Minimum covariance determinant estimator
  • Rousseeuw and Van Zomeren