The Five-Number Summary and Boxplots
A boxplot can be used to graphically represent the data set. These plots involve five
specific values:
- The lowest value of the data set (i.e., minimum)
- Q1
- The median
- Q3
- The highest value of the data set (i.e., maximum)
These values are called a
five-number summary of the data set.
A
boxplot (or box-and-whisker plot) is a graph of a data set obtained by drawing a horizontal line from the
minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value,
and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside
the box passing through the median or Q2.
Boxplot Example
The number of meteorites found in 10 states of the United States is
89, 47, 164, 296, 30,
215, 138, 78, 48, 39.
Construct a boxplot for the data.
Solution
Step 1 Arrange the data in order:
30, 39, 47, 48, 78, 89, 138, 164, 215, 296
Step 2 Find the median. There isn't a single data value in the middle of the sorted list, so we take the average of the two values in the middle
median = (78+89)/2 = 83.5
Step 3 Find Q1. 47 is the value in the center of the lower 50% of the data, so Q1 = 47.
30, 39,
47, 48, 78
Step 4 Find Q3. 164 is the value in the center of the top 50% data, so Q3 = 164.
89, 138,
164, 215, 296
Step 5 Draw a scale for the data on the x axis.
Step 6 Located the lowest value, Q1, median, Q3, and the highest value on the scale.
Step 7 Draw a box around Q1 and Q3, draw a vertical line through the median, and
connect the upper value and the lower value to the box. See the figure below
Modified Boxplot Example
A
modified boxplot (or modified box-and-whisker plot) is a plot that shows the center, spread, and skewness of a data set. It is constructed
by drawing a box and two whiskers that use the median, the first quartile, the third quartile,
and the smallest and the largest values in the data set between the lower and the upper inner
fences.

The data is skewed.
EXAMPLE,: The following data are the incomes (in thousands of dollars) for a sample of 12 households.
75 69 84 112 74 104 81 90 94 144 79 98
Construct a modified box-and-whisker plot for these data.
Solution The following five steps are performed to construct a box-and-whisker plot.
Step 1. First, rank the data in increasing order and calculate the values of the median, the
first quartile, the third quartile, and the interquartile range. The ranked data are
69 74 75 79 81 84 90 94 98 104 112 144
For these data,
\begin{align}
Median &= (84 + 90)/2 = 87\\
Q1 &= (75 + 79)/2 = 77\\
Q3 &= (98 + 104)/2 = 101\\
IQR &= Q3 - Q1 = 101 - 77 = 24
\end{align}
Step 2. Find the points that are $1.5 \times IQR$ below Q1 and $1.5 \times IQR$ above Q3. These
two points are called the lower and the upper inner fences, respectively.
\begin{align}
1.5 \times IQR &= 1.5 \times 24 = 36\\
\text{Lower inner fence } &= Q1 - 36 = 77 - 36= 41\\
\text{Upper inner fence } &= Q3 - 36 = 101 + 36 = 137
\end{align}
Step 3. Determine the smallest and the largest values in the given data set within the two
inner fences. These two values for our example are as follows:
\begin{align}
\text{Smallest value within the two inner fences } &=69\\
\text{Largest value within the two inner fences } &= 112
\end{align}
Step 4. Draw a horizontal line and mark the income levels on it such that all the values
in the given data set are covered. Above the horizontal line, draw a box with its left side
at the position of the first quartile and the right side at the position of the third quartile. Inside
the box, draw a vertical line at the position of the median. The result of this step is
shown in the figure below.
Step 5. By drawing two lines, join the points of the smallest and the largest values within
the two inner fences to the box. These values are 69 and 112 in this example as listed in
Step 3. The two lines that join the box to these two values are called whiskers. A value
that falls outside the two inner fences is shown by marking an asterisk and is called an outlier.
This completes the box-and-whisker plot, as shown in the figure below.
Information Obtained from a Boxplot
-
- If the median is near the center of the box, the distribution is approximately symmetric.
- If the median falls to the left of the center of the box, the distribution is positively
skewed.
- If the median falls to the right of the center, the distribution is negatively skewed.
-
- If the lines/whiskers are about the same length, the distribution is approximately symmetric.
- If the right line is larger than the left line, the distribution is positively skewed.
- If the left line is larger than the right line, the distribution is negatively skewed.
Definitions
EXPLORATORY DATA ANALYSIS the act of analyzing data to determine what information can be obtained by using stem and leaf plots, medians, interquartile ranges, and boxplots
INTERQUARTILE RANGE $Q3−Q1$. The range of the middle 50% of the data
NEGATIVELY SKEWED OR LEFT-SKEWED DISTRIBUTION a distribution in which the majority of the data values fall to the right of the mean
POSITIVELY SKEWED OR RIGHT-SKEWED DISTRIBUTION a distribution in which the majority of the data values fall to the left of the mean
Determining Normality
PEARSON'S INDEX OF SKEWNESS VALUE is a formula used to determine the degree of skewness of a variable.
\[
\text{PEARSON'S INDEX OF SKEWNESS VALUE }= \frac{3(\bar{X}-median)}{s}
\] (where \( \bar{X} \) is the sample mean and $s$ is the sample standard deviation.
- If the index is greater than 1, then the data are positively skewed (skewed right)
- If the index is less than -1, then the data are negatively skewed (skewed left)
- If neither of these conditions is satisfied, then the data is not significantly skewed.
EXAMPLE
A survey of 18 high-technology firms showed the number of days’ inventory they
had on hand. Determine if the data are approximately normally distributed.
5 29 34 44 45 63 68 74 74
81 88 91 97 98 113 118 151 158
Solution
Step 1 Construct a frequency distribution and draw a histogram for the data.
Since the histogram is approximately bell-shaped, we can say that the distribution is
approximately normal.
Step 2 Check for skewness. For these data, $\bar{X}= 79.5$, median = 77.5, and $s = 40.5$.
Using Pearson’s index of skewness gives
\[
\text{index of skewness} = \frac{3(79.5-77.5)}{40.5} = 0.148
\]
In this case, the index of skewness is not greater than 1 or less than -1, so it can be
concluded that the distribution is not significantly skewed.
Step 3 Check for outliers. Recall that an outlier is a data value that lies more than
1.5 (IQR) units below Q1 or 1.5 (IQR) units above Q3. In this case, Q1 = 45
and Q3 = 98; hence, IQR = Q3 - Q1 = 98 - 45 = 53. An outlier would be
a data value less than 45 - 1.5(53) = -34.5 or a data value larger than
98 + 1.5(53) = 177.5. In this case, there are no outliers.
Since the histogram is approximately bell-shaped, the data are not significantly
skewed, and there are no outliers, it can be concluded that the distribution is
approximately normally distributed.
EXAMPLE
The data shown consist of the number of games played each year in the career of
Baseball Hall of Famer Bill Mazeroski. Determine if the data are approximately
normally distributed.
81 148 152 135 151 152
159 142 34 162 130 162
163 143 67 112 70
Solution
Step 1 Construct a frequency distribution and draw a histogram for the data.

The histogram shows that the frequency distribution is somewhat negatively
skewed.
Step 2 Check for skewness; $\bar{X}$ = 127.24, median = 143, and $s$ = 39.87.
\[
\text{index of skewness} = \frac{3(127.24-143)}{39.87} = -1.19
\]
Since the index is less than -1, it can be concluded that the distribution is
significantly skewed to the left.
Step 3 Check for outliers. In this case, Q1 = 96.5 and Q3 = 155.5. IQR = Q3 -
Q1 = 155.5 - 96.5 = 59. Any value less than 96.5 - 1.5(59) = 8 or above
155.5 + 1.5(59) = 244 is considered an outlier. There are no outliers.
In summary, the distribution is somewhat negatively skewed.