The basic idea The normal distribution

For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799, we can from the above statement estimate that approximately 95 of the scores will fall in the range of 20.875-27.0799 to 20.875+27.0799 or between 6.7152 and 35.0348. This kind of information is a critical stepping stone to enabling us to compare the performance of an individual on one variable with their performance on another, even when the variables are measured on entirely different scales.

C. Descriptive statistics for measurements of a single variable

1. The basic idea

We now deal with descriptive statistics for measurements of a single variable. It is imagined that we have a large population of values from which we take samples. The population could consist of the diameters of automobile drive shafts produced in a given plant. To make sure the manufacturing equipment continues to operate satisfactorily, we measure the diameter of every tenth drive shaft. 1 The measurements over a given time period are called “samples” of the “population” of all drive shafts. The measurements will vary somewhat, both because of finite tolerances in the manufacturing equipment and because of uncertainties in the measurements themselves. From the samples, we wish to make judgments about the underlying population, i.e. the actual diameters of all drive shafts made. For example, the mean average of the samples is expected to be approximately the true unknown mean of the population. The accuracy of this sample estimate of the population mean would be expected to improve as the sample size is increased. For example, if we measured every other drive shaft, we would expect the mean of our measurements to become closer to the actual average diameter of all drive shafts than when we measured only 110 of them. One of the primary objectives of statistics is to make quantitative statements. For example, rather than just saying that the average drive shaft diameter is approximately equal to the sample mean, we’d like to give a range of diameters within which the true mean lies with a probability of 95.

2. The normal distribution

The most common assumption made in statistical treatments of data is that the probability of a particular value x deviating from the population mean  is inversely proportional to the square of its deviation from the mean. This gives rise to the familiar “bell-shaped curve” normal probability density function: 1 While we could measure every drive shaft, this is unnecessarily expensive. 2 2 2 x e 2 1 x f        2.2 where  2 the population variance, which is the mean of all values of x -  2 . The factor   2 1 was chosen so that     1 dx x f . The probability that a given sample x lies between a and b is  b a dx x f , 2 which gives the fundamental meaning of the probability density function f. To illustrate the normal distribution, we present on the next page a MATLAB program to generate normally-distributed random numbers and compare the resulting histogram with equation 2.2 . To save time, you can cut and paste this program into MATLAB’s Editor, save in your working directory as ranhys.m, and then execute in MATLAB’s Command window by typing ranhys. Try it for several values of the mean, variance and number of values, n. Notice how the histogram approaches the shape 3 of the normal distribution better and better as n is increased. A histogram for  = 5,  2 = 2 and n = 500 is given as Figure 2.1 on the next page. ranhys.m W.R. Wilcox, Clarkson University, 1 June 2004. Comparison of a histogram of normally distributed random numbers with a normal distribution. n is the number of samples sigma is the sample standard deviation mu is the sample mean X is the vector of values clear n = inputEnter the number of values to be generated ; mu = inputEnter the population mean ; sigsq = inputEnter the population variance ; sigma = sqrtsigsq; Set the state for the random number generator See help randn randnstate,sum100clock; Generate the random numbers desired X = mu + sigmarandnn,1; Plot the histogram with 10 bins see help hist histX,10, xlabelvalue, ylabelnumber in bin h = findobjgca,Type,patch; seth,FaceColor,m,EdgeColor,w hold on Now create a curve for the normal distribution with a maximum equal to 14 of the number of values n x = mu-4sigma:sigma100:mu+4sigma; 2 That is, the area under the fx curve between a and b. 3 Compare only the shape, as here the maximum in the normal distribution is arbitrarily set to n4. f = 0.25nexp-x-mu.22sigma.2; plotx,f, legendrandom number, normal distribution titleComparison of random number histogram with normal distribution shape hold off Do you get the same histogram if you use the same values again for  ,  2 and n? Examine the code until you understand why. Figure 3. Sample histogram for  = 5,  2 = 2 and n = 500. See http:www.shodor.orginteractivateactivitiesNormalDistribution for a graphical illustration of the influence of population standard deviation on the normal distribution and the influence of bin size on a histogram.

The basic idea The normal distribution

C. Descriptive statistics for measurements of a single variable

1. The basic idea

2. The normal distribution

3. Tests to see if a population is normally distributed

Parts

Dokumen yang terkait

applied statisticsyllabus dadan

APPLIED STATISTICS FOR RESEARCH IN LANGUAGE EDUCATION-S3.doc

Applied Statistics and Probability for E (2)

Applied Statistics and Probability for E (3)

Applied Statistics and Probability for E (1)

Applied Probability and Statistics 2009

PROBABILISTIC STATISTICS Module 2 Descriptive Statistics

PROBABILISTIC STATISTICS Module 3 Probabilitas 1

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics for Bioinformatics using R

Dukungan

Links

The basic idea The normal distribution

C. Descriptive statistics for measurements of a single variable

1. The basic idea

2. The normal distribution

3. Tests to see if a population is normally distributed

Parts

Dokumen yang terkait

applied statisticsyllabus dadan

APPLIED STATISTICS FOR RESEARCH IN LANGUAGE EDUCATION-S3.doc

Applied Statistics and Probability for E (2)

Applied Statistics and Probability for E (3)

Applied Statistics and Probability for E (1)

Applied Probability and Statistics 2009

PROBABILISTIC STATISTICS Module 2 Descriptive Statistics

PROBABILISTIC STATISTICS Module 3 Probabilitas 1

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics for Bioinformatics using R

Dokumen yang Anda mencari sudah siap untuk unduhkan