Descriptive Statistics

10.2 Descriptive Statistics

Let us start by considering circular data, with data points represented by a unit length vector:

x = [cos θ sinθ ] . ’

The mean direction of n observations can be obtained in Cartesian co-ordinates, in the usual way:

i = 1 cos θ i / n ; s = ∑ i = 1 sin θ i / n . 10.2

The vector r =[ c s ] is the mean resultant vector of the n observations, with ’

mean resultant length:

r = c 2 + s 2 ∈ [0, 1],

and mean direction (for r ≠ 0):

 arctan( s / c ),

if c ≥ 0 ;

 arctan( s / c ) + π sgn( s ), if c < 0 .

Note that the arctangent function (MATLAB and R atan function) takes value in [ − π/2, π/2], whereasθ takes value in [−π , π], the same as using the MATLAB and R function atan2(y,x) with y representing the vertical

10.2 Descriptive Statistics

component s and x the horizontal component c . Also note that r and θ are

invariant under rotation. The mean resultant vector can also be obtained by computing the resultant of

the n unit length vectors. The resultant, r = [ n c n s ] , has the same angle, ’ θ , and a

vector length of r = n r ∈ [0, n]. The unit length vector representing the mean

direction, called the mean direction vector, is x 0 = [cos θ sinθ ] . ’

The mean resultant length r , point estimate of the population mean length ρ, can be used as a measure of distribution concentration. If the vector directions are uniformly distributed around the unit circle, then there is no preferred direction and the mean resultant length is zero. On the other extreme, if all the vectors are concentrated in the same direction, the mean resultant length is maximum and equal to 1. Based on these observations, the following sample circular variance is defined:

v = 2(1 – r ) ∈ [0, 2].

The sample circular standard deviation is defined as:

s = − 2 ln r ,

reducing to approximately v for small v. The justification for this definition lies in the analysis of the distribution of the wrapped random variable X w :

X ~ n µ , σ ( x ) ⇒ X w = X (mod 2 π ) ~ w µ , ρ ( x w ) = ∑ n µ , σ ( x + 2 π k ) . 10.7

k = −∞

The wrapped normal density, w µ,ρ , has ρ given by:

ρ 2 = exp( − σ / 2 ) ⇒ σ = − 2 ln ρ .

For spherical directions, we consider the data points represented by a unit length vector, with the x, y, z co-ordinates computed as in Table 10.1. The mean resultant vector co-ordinates are then computed in a similar way as in formula 10.2. The definitions of spherical mean direction, ( θ , φ ) , and spherical variance are the direct generalisation to the sphere of the definitions for the circle, using the three-dimensional resultant vector. In particular, the mean direction vector is:

x 0 = [sin θ cosφ sinθ sinφ cosθ ] . ’ 10.9

Example 10.5

Q: Consider the data matrix j of Example 10.3 (Joints’ dataset). Compute the longitude, co-latitude and length of the resultant, as well as the mean resultant length and the standard deviation.

10 Directional Data

A: We use the function resultant (see Commands 10.3) in MATLAB, as follows:

» [x,y,z,f,t,r] = resultant(j) ... f=

% resultant length » rbar=r/size(j,1) rbar =

0.9376 % mean resultant length » s=sqrt(-2*log(rbar)) s=

0.3591 % standard deviation in radians

Note that the mean co-latitude (178.8º) does indeed confirm the visual observations of Example 10.3. The data is highly concentrated ( r =0.94, near 1). The standard deviation corresponds to an angle of 20.6º.

Commands 10.3. MATLAB and R functions for computing descriptive statistics and performing simple operations with directional data.

as=convazi(a) ; as=convlat(a) [x,y,z,f,t,r] = resultant(a)

MATLAB m = meandir(a,alphal) [m,rw,rhow]=pooledmean(a) v=rotate(a) ; t=scattermx(a); d=dirdif(a,b)

R convazi(a) ; convlat(a) resultant(a) ; dirdif(a,b)

Functions convazi and convlat convert azimuth into longitude and latitude into co-latitude, respectively.

Function resultant determines the resultant of unit vectors whose angles are the elements of a (in degrees). The Cartesian co-ordinates of the resultant are returned in x, y and z. The polar co-ordinates are returned in f (φ ), t (θ ) and r.

Function meandir determines the mean direction of the observations a. The angles are returned in m(1) and m(2). The mean direction length r is returned in m(3). The standard deviation in degrees is returned in m(4). The deviation angle corresponding to a confidence level indicated by alphal, assuming a von Mises distribution (see section 10.3), is returned in m(5). The allowed values of alphal (alpha level) are 1, 2 3 and 4 for α = 0.001, 0.01, 0.05 and 0.1, respectively.

10.3 The von Mises Distributions

Function pooledmean computes the pooled mean (see section 10.6.2) of independent samples of circular or spherical observations,

a. The last column of a

contains the group codes, starting with 1. The mean resultant length and the weighted resultant length are returned through rw and rhow, respectively.

Function rotate returns the spherical data matrix v (standard format), obtained by rotating a so that the mean direction maps onto the North Pole. Function scattermx returns the scatter matrix t of the spherical data a (see section 10.4.4). Function dirdif returns the directional data of the differences of the unit vectors corresponding to a and b (standard format). The R functions behave in the same way as their equivalent MATLAB functions. For instance, Example 10.5 is solved in R with:

j <- convlat(cbind(j[,1],-j[,2])) > o <- resultant(j)

>o [1] 0.6487324 1.4182647 -73.1138435 65.4200379 [5] 178.7780083 73.1304754