ELEMENTS OF STATISTICS
14.1 ELEMENTS OF STATISTICS
1. Population and sample. Observed values of x (variate) for a finite number of years is known as ‘sample’ of x. Say annual flood peaks or annual rainfall for 75 years gives the sample; on the other hand, population consists of the values of annual flood peaks from time immemo-
rial to eternity. The ‘population parameters’ can be estimated by means of parameters ob- tained from the sample, known as ‘sample parameters’. Each phenomena is characterised by a certain value, which varies in time and space. This characterisation is called ‘variable’ and its particular value is a ‘variate’.
If the value of one variate is independent of any other, the variable in question is a ‘random variable’. The hydrological processes are mostly random and hence the respective variables are equally random. All the time series (as well as other series) may be characterised by statistical parameters.
2. Central tendency. Three types of parameters are generally used to represent meas- ures of central tendency. (a) Expected value. This value of the random variable x is given by
µ= −∞ x f(x) dx
This population parameter can be estimated by the sample parameters as
Σx
(i) Arithmetic mean
Σfx
for grouped data
...(14.2 a)
(ii) Geometric mean
(iii) Harmonic mean
where n = the size of the sample, say the number of years of annual flood peaks.
HYDROLOGY
For any set of variates x > x g > x h
In most cases, the arithmetic mean gives the best estimate of the expected value, i.e., µ≈ x . (b) Median. The value of the variate such that half of the variates are below it and the
other half above it, is called the median of the series, i.e., it is the value of the variate having a 50% cumulative frequency.
(c) Mode. The value of the variate having the highest frequency is called the mode, Fig. 14.1 (see also Fig. 15.2). For unimodel curves, which are moderately skewed, the empiri- cal relation is
Mean – mode = 3(mean – median)
Cumulative frequency = 50% (See also Fig. 15.2)
Max. frequency
x = precipitation
or flood peak
f(x).
density occurrences
Area represents
probability of
No
occurrence of
Probability or
magnitude > x
mode mode mean mean
Variate x
median median x x
Fig. 14.1 Skewed distribution
3. Variability. The measures of variability or dispersion of a probability distribution curve are given by the following parameters.
(a) Mean deviation. The mean of the absolute deviations of values from their mean is called mean deviation (MD)
(b) Standard deviation. It is the square root of the mean-squared deviation of the variates from their mean, and the standard deviation for the population (σ p ) is given by
Σ( 2 x −µ )
and this is estimated from the standard deviation for the sample (σ) given by
Σ( 2 x − x )
Σ x 2 − xx Σ
...(14.8 a)
STATISTICAL AND PROBABILITY ANALYSIS OF HYDROLOGICAL DATA
for grouped data,
...(14.8 d)
The dispersion about the mean is measured by the standard deviation, which is also called the root mean square of the departures from the mean, Fig. 14.2.
2 Normal curve (bell - Shaped)
n–1
(area under curve = 1)
f(x) n = Sample size Shaded area represents % of time the variate x
> x but < x 1 2
density occurrences
x = precipitation or
of
flood peak
. No
Mean = x = median
Probability or
2 2 2 2 Area represents (2.14%)
probability (% of time )
3 3 Variate x
Fig. 14.2 Normal distribution curve
(c) Variance. The square of the standard deviation is called variance, i.e., given by σ 2 p
for the population and σ 2 for the sample.
(d) Range. The range (R) denotes the difference between the largest and smallest values of the sample and is given by Hurst (1951, 1956) and Klemes (1974) as
R = σ (n/2) k , 0.5 < k < 1
...(14.9 a) for a random normally distributed time series.
= 1.25 σ p n
(e) Coefficient of variation. The standard deviation divided by the mean is called the coefficient of variation (C v ) and is given by
4. Skewness (asymmetry). The lack of symmetry of a distribution is called skewness or asymmetry. The population skewness (α) is given
Σ( x −µ ) 3
HYDROLOGY
This is estimated from the sample skewness (a) given by
For grouped data
...(14.12 a)
The degree of the skewness of the distribution is usually measured by the ‘coefficient of skewness’ (C s ) and is given by
...(14.13) Another measure of skewness often used in practice is Pearson’s skewness (S k ) given by
µ − mode
x− mode
3( x− median )
σ ...(14.15) Example 14.1 For the grouped data of the annual floods in the river Ganga at Hardwar
From Eq. (14.5),
(1885-1971), find the mean, median, and mode. Determine the coefficients of skew and the coefficient of variation.
Class interval
(1000 cumec)
Frequency
0-2* 2-4*
12-14 14-16
16-18
18-20
*from 0 to <2. from 2 to <4, and like that.
Solution The computations are made in Table 14.1 (i) Mean x = 6.6 tcm (ii) Standard deviation, σ = 3.16 tcm
F n /2 − CF I
(iii) Median = L md +
HG CI
f md
KJ
=4+ F /− 87 2 17 I 2 = 6 tcm HG 27 KJ
STATISTICAL AND PROBABILITY ANALYSIS OF HYDROLOGICAL DATA
(iv) Mode = L
mo +
HG see Fig. 15.2
d 1 + d 2 KJ
CI,
F 10 I
HG 2 = 5 tcm
10 + 9 KJ
(v) Coefficients of skew (C s )
x− mode
Pearsons first coefficient, C s 1 =
. 3 16 3( x− median )
Pearson second coefficient, C s 2 =
. 3 16 Σfxx ( − ) 3 3818.55
For flood data (Foster),
Adjustment for the period of record,
C s F (adj) =C s 1+ I
= 1.4 F 1 + I = 1.5
HG n KJ HG 87 KJ
All the coefficients of skew are positive and the skew is to the right; if the coefficients were negative, the skew would have been to the left.
Table 14.1 Compuations for mean, median and mode. (Example 14.1)
Class interval Mid-point Frequency Product x– x (x – x ) 2 f.(x – x ) 2 (x – x ) 3 f.(x – x ) 3 CI of CI
f f.x
(1000 cumec) x 0-2
Σf(x – x ) 2 = 858.1
Σf(x – x ) 3 = 4725 05 – 906.50 = 3818.55
Σfx
Mean x
Standard deviation, σ =
n − 1 87 − 1
= 3.16 tcm
HYDROLOGY
(vi) Coefficient of variation,
Assignment problem For the grouped data of the partial duration floods (Q b > 4333 cumec) of river Ganga at Hardwar
(1885-1971), find the mean, median and mode. Also determine the coefficient/s of skew and coefficient of variation.
Class interval
Frequency
(1000 cumec)
(or no. of occurrences)
*from 4 to <6. from 6 to <8, and like that.
[Hint See Fig. 15.4]