Box and Whisker plot :
A box and whisker plot is a graphical representation of a dataset that displays the distribution of the data and its range. It consists of a box, which represents the middle 50% of the data, and two whiskers, which extend from the box to show the range of the data. The box and whisker plot is a useful tool for comparing the distribution of different datasets, as well as identifying potential outliers in the data.
To create a box and whisker plot, the first step is to calculate the median and the first and third quartiles of the data. The median is the middle value in the dataset, and the quartiles are the values that divide the data into four equal parts. For example, if we have a dataset with 10 values, the median would be the 5th value, and the first and third quartiles would be the 2nd and 8th values, respectively.
Once the median and quartiles are calculated, the next step is to create the box. The box is a rectangle that extends from the first quartile to the third quartile, and its height is determined by the difference between the first and third quartiles. For example, if the first quartile is 10 and the third quartile is 20, the box would extend from 10 to 20, and its height would be 10.
Next, the whiskers are added to the box. The upper whisker extends from the third quartile to the maximum value in the dataset, and the lower whisker extends from the first quartile to the minimum value in the dataset. The length of the whiskers can be determined by the interquartile range, which is the difference between the first and third quartiles. For example, if the first quartile is 10 and the third quartile is 20, the interquartile range would be 10, and the upper and lower whiskers would extend from 20 to the maximum value and from 10 to the minimum value, respectively.
Outliers in the dataset can also be identified in a box and whisker plot. Outliers are values that are significantly higher or lower than the rest of the data, and they can be identified as points that lie outside of the upper and lower whiskers. For example, if the upper whisker extends from 20 to 30 and the lower whisker extends from 10 to 20, and there is a value of 40 in the dataset, it would be considered an outlier.
Let’s look at an example to see how a box and whisker plot can be created and used to compare the distribution of different datasets. Suppose we have two datasets, A and B, with the following values:
Dataset A: 10, 20, 30, 40, 50, 60, 70
Dataset B: 20, 40, 60, 80, 100, 120, 140
First, we need to calculate the median and quartiles for each dataset. For dataset A, the median is 40, and the first and third quartiles are 20 and 60, respectively. For dataset B, the median is 80, and the first and third quartiles are 40 and 120, respectively.
Next, we can create the box and whisker plots for each dataset. The box for dataset A would extend from 20 to 60, and its height would be 40. The upper and lower whiskers would extend from 60 to 70 and from 10 to 20, respectively. The box for dataset B would extend from 40 to 120, and its height would be 80. The upper and lower whiskers would extend from 120 to 140 and from 20 to 40, respectively.
From the box and whisker plots, we can see that the median and quartiles are higher for dataset B than for dataset A, indicating that the values in dataset B are generally higher than those in dataset A. We can also see that the range of values in dataset B is larger than that in dataset A, as indicated by the longer whiskers in the box and whisker plot for dataset B.
Additionally, we can use the box and whisker plots to identify potential outliers in the data. For dataset A, there are no values that lie outside of the upper and lower whiskers, so there are no outliers in this dataset. However, in dataset B, there is a value of 140, which lies outside of the upper whisker, indicating that it is an outlier.
Overall, the box and whisker plot is a useful tool for visualizing and comparing the distribution of data. It provides a quick and easy way to identify the range, median, and quartiles of a dataset, as well as potential outliers in the data. By comparing the box and whisker plots of different datasets, we can gain insight into the differences and similarities in the data and make informed decisions based on this information.