Measures of Shape
2.3.3 Measures of Shape
The most popular measures of shape, exemplified for the PRT variable of the Cork Stoppers’ dataset (see Table 2.8), are presented next.
2.3.3.1 Skewness
A continuous symmetrical distribution around the mean, µ, is defined as a distribution satisfying:
This applies similarly for discrete distributions, substituting the density function by the probability function.
A useful asymmetry measure around the mean is the coefficient of skewness, defined as:
γ = Ε [ ( X − µ ) 3 ] / σ 3 . 2.14
This measure uses the fact that any central moment of odd order is zero for symmetrical distributions around the mean. For asymmetrical distributions γ reflects the unbalance of the density or probability values around the mean. The
formula uses a 3 σ standardization factor, ensuring that the same value is obtained for the same unbalance, independently of the spread. Distributions that are skewed
to the right (positively skewed distributions) tend to produce a positive value of γ, since the longer rightward tail will positively dominate the third order central moment; distributions skewed to the left (negatively skewed distributions) tend to produce a negative value of γ, since the longer leftward tail will negatively dominate the third order central moment (see Figure 2.24). The coefficient γ, however, has to be interpreted with caution, since it may produce a false impression of symmetry (or asymmetry) for some distributions. For instance, the probability function p k = {0.1, 0.15, 0.4, 0.35}, k = {1, 2, 3, 4}, has γ = 0, although it is an asymmetrical distribution.
The skewness of a dataset x 1 , …, x n is the point estimate of γ, defined as:
i = 1 ( x i − x ) / [ ( n − 1 )( n − 2 ) s 3 ] . 2.15
2.3 Summarising the Data
Note that:
– For symmetrical distributions, if the mean exists, it will coincide with the median. Based on this property, one can also measure the skewness using
g = (mean − median)/(standard deviation). It can be proved that –1 ≤ g ≤ 1. – For asymmetrical distributions, with only one maximum (which is then the
mode), the median is between the mode and the mean as shown in Figure
a mode
mode mean
mean
median
b median
Figure 2.24. Two asymmetrical distributions: a) Skewed to the right (usually with γ > 0); b) Skewed to the left (usually with γ < 0).
2.3.3.2 Kurtosis
The degree of flatness of a probability or density function near its center, can be characterised by the so-called kurtosis, defined as:
4 κ 4 = Ε [ ( X − µ ) ] / σ − 3 . 2.16
The factor 3 is introduced in order that κ = 0 for the normal distribution. As a matter of fact, the κ measure as it stands in formula 2.16, is often called coefficient of excess (excess compared to the normal distribution). Distributions flatter than the normal distribution have κ < 0; distributions more peaked than the normal distribution have κ > 0.
The sample estimate of the kurtosis is computed as:
k = [ n ( n + 1 ) M 4 − 3 ( n − 1 ) M 2 2 ] / [ ( n − 1 )( n − 2 )( n − 3 ) s 4 ] , 2.17
with: n M j =
i = 1 ( x i − x ) . Note that the kurtosis measure has the same shortcomings as the skewness measure. It does not always measure what it is supposed to.
The skewness and the kurtosis have been computed for the PRT variable of the Cork Stoppers’ dataset as shown in Table 2.8. The PRT variable exhibits a positive skewness indicative of a rightward skewed distribution and a positive kurtosis indicative of a distribution more peaked than the normal one.
66 2 Presenting and Summarising the Data
There are no functions in the R stats package to compute the skewness and kurtosis. We provide, however, as stated in Commands 2.8, R functions for that purpose in text file format in the book CD (see Appendix F). The only thing to be done is to copy the function text from the file and paste it in the R console, as in the following example:
> skewness <- function(x){ + n <- length(x) + y <- (x-mean(x))^3 + n*sum(y)/((n-1)*(n-2)*sd(x)^3) +} > skewness(PRT) [1] 0.592342
In order to appreciate the obtained skewness and kurtosis, the reader can refer to Figure 2.25 where these measures are plotted for several distributions (see Appendix B). For more details see (Dudewicz EJ, Mishra SN, 1988).
Table 2.8. Skewness and kurtosis for the PRT variable of the cork stopper dataset. Skewness Kurtosis
Impossible area
Uniform
2 Normal Beta area
Student t
4 Ga m m a
Figure 2.25. Skewness and kurtosis coefficients for several distributions.