MAT 254 Probability and Statistics Sprin

MAT 254- Probability and Statistics
Spring 2015
LECTURE 2
DATA COLLECTION AND
PRESENTATION (Charts, graphs, etc)
HISTOGRAM

f

LINE GRAPH

9
8
7
6

midpoint

5
4
3

2
1
0

25

30

35

40

45

50

55

60


65

midpoint
29.5 - UCB
27- midpoint

70

75

80

85

90

Types of Data
Quantitative data are measurements that are recorded
on a naturally occurring numerical scale.
Exp. Height in cm. ,weight in kg. ,blood pressure

(mm/Hg)
Qualitative data are measurements that cannot be
measured on a natural numerical scale; they can only be
classified into one of a group of categories.
Exp . Sex, tall or short, blood group

2

SAMPLING TECHNIQUES
Sampling techniques
are used to economize (on
the part of the researcher)
the following:
 Time

 Money

 Effort

3/11/2015


[email protected]

POPULATION

SAMPLE

3/11/2015

[email protected]

Sampling techniques are classified
into:

• probability sampling
• non-probability sampling

3/11/2015

[email protected]


PROBABILITY SAMPLING
 It is a method of selecting a sample (n)

from a universe (N) such that each
member of the population has an equal
chance of being included in the sample
and all possible combinations of size (n)
have an equal chance of being chosen
as the sample.

3/11/2015

[email protected]

NON-PROBABILTY SAMPLING

It is a method wherein the
manner of selecting a sample
(n) from a universe (N)

depends on some inclusion
rule as specified by the
researcher.
3/11/2015

[email protected]

PROBABILITY SAMPLING
TECHNIQUES
• Simple Random (Lottery) Sampling
• Systematic Sampling
• Stratified Sampling
• Cluster or Area Sampling
• Multi-stage Sampling

3/11/2015

[email protected]

SRS or Lottery Sampling


 It is done by simply assigning number
to each member of the population in a
piece of paper, placing them in a
container and drawing the desired
number of samples from it.
 This applies to a not-

so-large population
when listing is still
possible.
3/11/2015

[email protected]

SYSTEMATIC SAMPLING

 This method still
uses the concept of
random sampling

and involves the
selection of the nth
element of a series
representing the
population.
3/11/2015

Ex:

N = 100, n = 25
N/n = 100/25
=4

• This means every 4th
element in a series should be
taken as a sample.

[email protected]

1


11

21

31

41

51

61

71

81

91

2


12

22

32

42

52

62

72

82

92

3


13

23

33

43

53

63

73

83

93

4

14

24

34

44

54

64

74

84

94

5

15

25

35

45

55

65

75

85

95

6

16

26

36

46

56

66

76

86

96

7

17

27

37

47

57

67

77

87

97

8

18

28

38

48

58

68

78

88

98

9

19

29

39

49

59

69

79

89

99

10

20

30

40

50

60

70

80

90

100

Note: All numbers in yellow color are the desired
samples.
3/11/2015

[email protected]

STRATIFIED SAMPLING
 This is a random sampling
technique in which the
population is divided into
non-overlapping
subpopulations called strata.

3/11/2015

[email protected]

MULTI-STAGE SAMPLING
A technique that considers
different stages or phases in
sampling.
Ex: Region
Province
City
Barangay
3/11/2015

[email protected]

– 1st level
– 2nd Level
– 3rd Level
– 4th Level

MULTI-STAGE SAMPLING
Schools
School Districts
Schools
Divisions

Schools
School Districts

Regions

Schools
School Districts

Divisions

Schools
School Districts
Schools

3/11/2015

[email protected]

NON-PROBABILITY SAMPLING
TECHNIQUES

• Purposive Sampling, based on a criteria
or qualifications given by the researcher.
Those who will satisfy the criteria are
included.

•Quota Sampling It is quick and cheap
since the interviewer is given a definite
instruction and quota about the section of
the population he is to work on.
3/11/2015

[email protected]

Presentation of Data
Objectives: At the end of the lesson, the
students should be able to:
1. Prepare a stem-and-leaf plot
2. Describe data in textual form
3. Construct frequency distribution table
4. Create graphs
5. Read and interpret graphs and tables
MCPegollo/Basic Statistics/SRSTHS

Presentation of Data
Textual
Method

Tabular
Method

• Rearrangeme
nt from
lowest to
highest
• Stem-and-leaf
plot

• Frequency
distribution
table (FDT)
• Relative FDT
• Cumulative
FDT
• Contingency
Table

MCPegollo/Basic Statistics/SRSTHS

Graphical
Method
• Bar Chart
• Histogram
• Frequency
Polygon
• Pie Chart
• Less than,
greater than
Ogive

Textual Presentation of Data
Data can be presented using paragraphs or
sentences. It involves enumerating important
characteristics, emphasizing significant figures
and identifying important features of data.

MCPegollo/Basic Statistics/SRSTHS

Solution
First, arrange the data in order for you to identify
the important characteristics. This can be done in
two ways: rearranging from lowest to highest or
using the stem-and-leaf plot.
Below is the rearrangement of data from lowest to
highest:
9

23

28

35

38

43

45

48

17

24

29

37

39

43

45

49

18

25

34

38

39

44

46

50

20

26

34

38

39

44

46

50

23

27

35

38

42

45

46

50

MCPegollo/Basic Statistics/SRSTHS

With the rearranged data, pertinent data
worth mentioning can be easily
recognized. The following is one way of
presenting data in textual form.
In the Statistics class of 40 students, 3 obtained
the perfect score of 50. Sixteen students got a score
of 40 and above, while only 3 got 19 and below.
Generally, the students performed well in the test
with 23 or 70% getting a passing score of 38 and
above.
MCPegollo/Basic Statistics/SRSTHS

Another way of rearranging data is by
making use of the stem-and-leaf plot.
What is a stem-and-leaf plot?
Stem-and-leaf Plot is a table which sorts
data according to a certain pattern. It involves
separating a number into two parts. In a twodigit number, the stem consists of the first digit,
and the leaf consists of the second digit. While in
a three-digit number, the stem consists of the
first two digits, and the leaf consists of the last
digit. In a one-digit number, the stem is zero.

MCPegollo/Basic Statistics/SRSTHS

Below is the stem-and-leaf plot of the
ungrouped data given in the example.
Stem

Leaves

0

9

1

7,8

2

0,3,3,4,5,6,7,8,9

3

4,4,5,5,7,8,8,8,8,9,9,9

4

2,3,3,4,4,5,5,5,6,6,6,8,9

5

0,0,0

Utilizing the stem-and-leaf plot, we can readily see the
order of the data. Thus, we can say that the top ten got
scores 50, 50, 50, 49, 48, 46, 46, 46,45, and 45 and the ten
lowest scores are MCPegollo/Basic
9, 17, 18,Statistics/SRSTHS
20, 23,23,24,25,26, and 27.

Tabular Presentation of Data
Below is a sample of a table with all of its parts indicated:
Table Number

Table Title
Column Header

Row Classifier
Body

Source Note
http://www.sws.org.ph/youth.htm
MCPegollo/Basic Statistics/SRSTHS

Sample of a Frequency Distribution
Table for Grouped Data
Table 1.2
Frequency Distribution Table for the Quiz Scores of 50 Students
in Geometry
Scores

Frequency

0-2

1

3-5

2

6-8

13

9 - 11

15

12 - 14

19

MCPegollo/Basic Statistics/SRSTHS

Lower Class Limits
are the smallest numbers that can actually
belong to different classes

Lower Class
Limits

Rating

Frequency

0-2

1

3-5

2

6-8

13

9 - 11

15

12 - 14

19

Upper Class Limits
are the largest numbers that can actually belong
to different classes
Rating

Frequency

0-2

1

3-5

2

6-8

13

9 - 11

15

12 - 14

19

Upper Class Limits
are the largest numbers that can actually belong
to different classes
Rating

Upper Class
Limits

Frequency

0-2

1

3-5

2

6-8

13

9 - 11

15

12 - 14

19

Class Boundaries
number separating classes

Rating
- 0.5

0-2

20

3-5

14

6-8

15

9 - 11

2

12 - 14

1

2.5

Class
Boundaries

Frequency

5.5
8.5
11.5

14.5

Class Midpoints
midpoints of the classes
Rating

Class
Midpoints

Frequency

0- 1 2

20

3- 4 5

14

6- 7 8

15

9 - 10 11

2

12 - 13 14

1

Class Width
is the difference between two consecutive lower class limits
or two consecutive class boundaries
Rating

Class Width

Frequency

3

0-2

20

3

3-5

14

3

6-8

15

3 9 - 11

2

3 12 - 14

1

Constructing A Frequency Table
1.

Decide on the number of classes .

2.

Determine the class width by dividing the range by the number of classes
(range = highest score - lowest score) and round up.
class width



range
round up of

number of classes

3.

Select for the first lower limit either the lowest score or a
convenient value slightly less than the lowest score.

4.

Add the class width to the starting point to get the second lower
class limit, add the width to the second lower limit to get the
third, and so on.

5.

List the lower class limits in a vertical column and enter the
upper class limits.

6.

Represent each score by a tally mark in the appropriate class.
Total tally marks to find the total frequency for each class.

Relative Frequency Table

relative frequency =

class frequency
sum of all frequencies

Relative Frequency Table
Rating Frequency

Relative
Rating Frequency

0-2

20

0-2

38.5%

3-5

14

3-5

26.9%

6-8

15

6-8

28.8%

9 - 11

2

9 - 11

3.8%

12 - 14

1

12 - 14

1.9%

Total frequency = 52

Table 2-5

20/52 = 38.5%
14/52 = 26.9%
etc.

Cumulative Frequency Table
>cf

Rating

Frequency

>>Simple ,

• >>> Multiple,
• >>>Components

Ahmed-Refat-ZU

Bar diagram

40

Graphical Presentation

Pie diagram:
Percentage of causes of child death in Egypt

congenital
10%

accident
10%

diarrhea
50%
chest infection
30%

Ahmed-Refat-ZU

Graphical Presentation Histogram:
• It is very similar to the bar chart with the difference
that the rectangles or bars are adherent (without
gaps).
• It is used for presenting class frequency table
(continuous data).
• Each bar represents a class and its height represents
the frequency (number of cases), its width represent
the class interval.

Ahmed-Refat-ZU

Graphical Presentation

Frequency Polygon
• Derived from a histogram by connecting the mid
points of the tops of the rectangles in the histogram.
• The line connecting the centers of histogram
rectangles is called frequency polygon.
• We can draw polygon without rectangles so we will
get simpler form of line graph.
• A special type of frequency polygon is the

Distribution Curve.
Ahmed-Refat-ZU

Normal

The Frequency Polygon
• Examples:

Age in Years

Sex

Mid-point of interval

Males

Females

20-30

3

2

(20+30)/2=25

30-40

5

5

(30+40)/2=35

40-50

7

8

(40+50)/2=45

50-60

4

3

(50+60)/2=55

60-70

2

4

(60+70)/2=65

Total

21

22

44

The Frequency Polygon
• Example:

Figure : Distribution of a group of subjects by age and sex

45

Graphical Presentation Scatter diagram
• - It is useful to represent the relationship
between two numeric measurements,
each observation being represented by a point
corresponding to its value on each axis

Ahmed-Refat-ZU

This scatter diagram showed a positive or direct
relationship between NAG and
albumin/creatinine among diabetic patients

NAG

Correlation between NAG and albumin creatinine
ratio in group of early diabetics
35
30
25
20
15
10
5
0
0

0.05

0.1

0.15

0.2

albumin creatinine ratio
Ahmed-Refat-ZU

0.25

0.3

0.35

Graphical Presentation Box Plots
Box Plots are another way of representing all the same information that can be found on a
Cumulative Frequency graph.

!

Lowest value

Highest value

Median
Lower Quartile

Upper Quartile

Inter-Quartile Range
Range
Note: The minimum value is the lowest possible value of your first group, and the maximum
value is the highest possible value of your last group