Measures of Spread

Measures of Spread

Spread, or dispersion, is the second important feature of frequency distributions. Just as measures of central location describe where the peak is located, measures of spread describe the dispersion (or variation) of values from that peak in the distribution. Measures of spread include the range, interquartile range, and standard deviation.

Range

Definition of range The range of a set of data is the difference between its largest (maximum) value and its smallest (minimum) value. In the statistical world, the range is reported as a single number and is the result of subtracting the maximum from the minimum value. In the epidemiologic community, the range is usually reported as “from (the minimum) to (the maximum),” that is, as two numbers rather than one.

Method for identifying the range

Step 1. Identify the smallest (minimum) observation and the largest (maximum) observation.

Step 2. Epidemiologically, report the minimum and maximum values. Statistically, subtract the minimum from the maximum value.

EXAMPLE: Identifying the Range

Find the range of the following incubation periods for hepatitis A: 27, 31, 15, 30, and 22 days.

Step 1. Identify the minimum and maximum values.

Minimum = 15, maximum = 31

Step 2. Subtract the minimum from the maximum value.

Range = 31–15 = 16 days

For an epidemiologic or lay audience, you could report that “incubation periods ranged from 15 to 31 days.” Statistically, that range is 16 days.

Percentiles

Percentiles divide the data in a distribution into 100 equal parts. The P th percentile (P ranging from 0 to 100) is the value that has P

percent of the observations falling at or below it. In other words, the 90 th percentile has 90% of the observations at or below it. The

median, the halfway point of the distribution, is the 50 th percentile. The maximum value is the 100 th percentile, because all values fall at or below the maximum.

Quartiles

Sometimes, epidemiologists group data into four equal parts, or quartiles. Each quartile includes 25% of the data. The cut-off for

the first quartile is the 25 th percentile. The cut-off for the second quartile is the 50 th percentile, which is the median. The cut-off for the third quartile is the 75 th percentile. And the cut-off for the fourth quartile is the 100 th percentile, which is the maximum.

Interquartile range

The interquartile range is a measure of spread used most commonly with the median. It represents the central portion of the

distribution, from the 25 th percentile to the 75 percentile. In other words, the interquartile range includes the second and third

th

quartiles of a distribution. The interquartile range thus includes approximately one half of the observations in the set, leaving one quarter of the observations on each side.

Method for determining the interquartile range

Step 1. Arrange the observations in increasing order.

Step 2. Find the position of the 1 and 3 quartiles with the following formulas. Divide the sum by the number of observations.

st

rd

Position of 1 th quartile (Q

st

1 ) = 25 percentile = (n + 1) / 4

Position of 3 th quartile (Q

rd

3 ) = 75 percentile = 3(n + 1) / 4 = 3 × Q 1

Step 3.

rd Identify the value of the 1 and 3 quartiles.

st

a. If a quartile lies on an observation (i.e., if its position is a whole number), the value of the quartile is the value of that observation. For example, if the position

of a quartile is 20, its value is the value of the 20 th observation.

b. If a quartile lies between observations, the value of b. If a quartile lies between observations, the value of

is 20¼, it lies between the 20 st and 21 observations, and its value is the value of the 20 th observation, plus

th

¼ the difference between the value of the 20 st and 21 observations.

th

Step 4. Epidemiologically, report the values at Q 1 and Q 3 . Statistically, calculate the interquartile range as Q 3 minus Q 1 .

Figure 2.7 The Middle Half of the Observations in a Frequency Distribution Lie within the Interquartile Range

EXAMPLE: Finding the Interquartile Range

Find the interquartile range for the length of stay data in Table 2.8 on page 2-17.

Step 1. Arrange the observations in increasing order.

Step 2. Find the position of the 1 st and 3 rd quartiles. Note that the distribution has 30 observations.

Position of Q 1 = (n + 1) / 4 = (30 + 1) / 4 = 7.75

Position of Q 3 = 3(n + 1) / 4 = 3(30 + 1) / 4 = 23.25

Thus, Q 1 lies ¾ of the way between the 7th and 8th observations, and Q 3 lies ¼ of the way between the 23 rd and 24th observations.

Step 3. Identify the value of the 1 st and 3 rd quartiles (Q 1 and Q 3 ).

Value of Q 1 : The position of Q 1 is 7¾; therefore, the value of Q 1 is equal to the value of the 7 th observation plus ¾ of the difference between the values of the 7 th and 8 th observations:

Value of the 7 th observation: 6 Value of the 8 th observation: 7

Q 1 = 6 + ¾(7 − 6) = 6 + ¾(1) = 6.75

Value of Q 3 : The position of Q 3 was 23¼; thus, the value of Q is equal to the value of the 23 rd observation plus ¼ of the difference between the value of the 23 rd

and 24 th observations:

Value of the 23 rd observation: 14 Value of the 24 th observation: 16

Q 3 = 14 + ¼(16 − 14) = 14 + ¼(2) = 14 + (2 / 4) = 14.5

Step 4. Calculate the interquartile range as Q 3 minus Q 1 .

Q 3 = 14.5 Q 1 = 6.75

Interquartile range = 14.5 − 6.75 = 7.75

As indicated above, the median for the length of stay data is 10. Note that the distance between Q 1 and the median is 10 – 6.75 = 3.25. The distance between Q 3 and the median is 14.5 – 10 = 4.5. This indicates that the length of stay data is skewed slightly to the right (to the longer lengths of stay).

Epi Info Demonstration: Finding the Interquartile Range

Question: In the data set named SMOKE, what is the interquartile range for the weight of the participants?

Answer:

In Epi Info: Select Analyze Data. Select Read (Import). The default data set should be Sample.mdb. Under Views, scroll down to

view SMOKE, and double click, or click once and then click OK. Click on Select. Then type in weight < 770, or select weight from available values, then type < 770, and click on OK. Select Means. Then click on the down arrow beneath Means of, scroll down and select WEIGHT, then click OK. Scroll to the bottom of the output to find the first quartile (25% = 130) and the third quartile (75% = 180). So the interquartile range runs from 130 to 180 pounds, for a range of 50 pounds.

Your Turn: What is the interquartile range of height of study participants? [Answer: 506 to 777]