intoduction of biostatistics2006b
Introduction of Biostatistics
HERTANTO WAHYU SUBAGYO
Biostat-1 Hertanto
1
Statistics
• The science of :
- collecting
- summarizing
- presenting
- interpreting data,
g them to test hypotheses.
yp
and of using
Biostat-1 Hertanto
2
Biostatistics
S i i in
Statistics
i the
h area off biological
bi l i l and
d
health sciences
Biostat-1 Hertanto
3
The purpose of a biostatistics
. to provide the numbers / tables / graphics
that contain information about a certain
situation
. to present them in such a way that valid
interpretations are possible
Biostat-1 Hertanto
4
Increasing role of biostatistics
• Biostatistics provides a way of organizing
information on a wider and more formal basis than
relying on the exchange of anecdotes and personal
experience
y in
• More thinggs are now measured qquantitatively
medicine
• There is a great deal ofintrinsic variation in most
bi l i l processes
biological
Biostat-1 Hertanto
5
To answer these q
questions,, we rely
y on
the methods of biostatistics
• Is the new drug effective + safe?
• Does ht e use off seat bbelt
l reduce
d
the
h chance
h
of death in motor vehicle accident?
• Where
h should
h ld government invest
i
its
i
resources if it wishes to reduce infant
mortality?
• etc
Biostat-1 Hertanto
6
Population
p
and samples
p
Except when a full cencus is taken,
taken the data are for a
sample from a larger group called the population.
The sample
Th
l is
i off interest
i
not in
i its
i own right,
i h but
b for
f
what it tells the investigators about the populations.
Because of chance, difference samples give different
results and this must be taken into account when
using
i a sample
l to
t make
k inferences
i f
about
b t the
th
populations. This phenomenon, called sampling
variation
variation.
Biostat-1 Hertanto
7
Sampling and representativity
Target Population
Sampling
Population
Sample
Target Population Î (Sampling Population) Î Sample
Biostat-1 Hertanto
8
Sampling
the process of selecting units from a population of interest
Sa m plin g M ode l
Biostat-1 Hertanto
9
Langkah-langkah penelitian
Fakta
Teori
Masalah
Tinjauan Pustaka
Generalisasi
- Identifikasi variabel
- Kerangka Teori
- Kerangka Konsep
Hi t i
Hipotesis
Verifikasi
- Desain
- Sampel
- Instrumen
- Pengumpulan Data
- Analisis
Biostat-1 Hertanto
Simpulan
10
Analysis
• By the time you get to the analysis of your
data, most of the really difficult works have
been done.
• It's much more difficult to :
define the research problem; develop and
i l
implement
a sampling
li plan;
l
conceptualize,
li
operationalize and test your measurements;
and develop
p a structure design.
g
• If you have done these works well, the
analysis of the data is usually a fairly
straightforward affair.
affair
Biostat-1 Hertanto
11
Data analysis involves three
major steps
• Cleaning and organizing the data for
analysis
l sis (Data
(D t Preparation)
P
ti )
• Describing the data (Descriptive
St tisti s)
Statistics)
• Testing Hypotheses and Models
(Inf
(Inferential
nti l St
Statistics)
tisti s)
Biostat-1 Hertanto
12
Data Preparation
•
•
•
•
•
logging the data
checking the data for accuracy
developing a database structure
entering
t i th
the data
d t into
i t the
th computer
t
transforming the data
Biostat-1 Hertanto
13
Logging the Data
In any research project data come from
a number of different sources at
d ff
different
times:
• mail surveys returns
• coded
d d interview
• laboratory
• etc
Biostat-1 Hertanto
14
Checking the Data For Accuracy
• As soon as data are received you should screen it
for accuracy.
accuracy
• In some circumstances, doing this right away
allows you to go back to the sample to clarify any
problems or errors.
• There are several questions you should ask as part
of this initial data screening:
9
9
9
9
Are the responses readable ?
Are all important questions answered?
A th
Are
the responses complete?
l t ?
Is all relevant contextual information included
(e.g.
g data, time, p
place, researcher)?
Biostat-1 Hertanto
15
Developing a database structure
• Defining variables
• Entering the data into the computer
• Data transformations
Biostat-1 Hertanto
16
Defining variables
•
•
•
•
•
•
•
variable name
variable description/label
value
l labels
l b l
missing values
variable
i bl ttype ((numeric,
i string,
t i
d
date
t etc)
t )
column format (width, alignment)
measurement level (N O I R).
R)
Biostat-1 Hertanto
17
Entering the Data into the
Computer
• There are a wide variety of ways to enter the
data into the computer for analysis.
• In order to assure a high level of data
accuracy, the analyst should use a procedure
y
called double entry.
Biostat-1 Hertanto
18
Data Transformations
•
•
•
•
Recode
Compute
p
Select cases
Rank cases
etc.
Biostat-1 Hertanto
19
Modification of data files
•Opening an existing data file
•Defining
D fi i new variables
i bl
•Entering new data
•Inserting and deleting cases and
variables
•Saving data files
Biostat-1 Hertanto
20
A l i
Analysis
• Descriptive
• Inferensial
Biostat-1 Hertanto
21
Descriptive Statistics
• are used to describe the basic features of the data
in a study.
• provide
id simple
i l summaries
i about
b
the
h sample
l andd
the measures.
• together with simple graphics analysis,
analysis they form
the basis of virtually every quantitative analysis of
data.
• with descriptive statistics you are simply
describing what is, what the data shows.
Biostat-1 Hertanto
22
Inferential Statistics
• investigate questions, models and
hypotheses.
f
statistics to try
y to
• we use inferential
infer from the sample data what the
population
p
p
thinks.
• Thus, we use inferential statistics to
make inferences from our data to more
general conditions.
Biostat-1 Hertanto
23
•
•
•
•
•
STATA
BMDP
EPIINFO
PEPI
SPSS
etc
Biostat-1 Hertanto
Softwares
24
HERTANTO WAHYU SUBAGYO
Biostat-1 Hertanto
1
Statistics
• The science of :
- collecting
- summarizing
- presenting
- interpreting data,
g them to test hypotheses.
yp
and of using
Biostat-1 Hertanto
2
Biostatistics
S i i in
Statistics
i the
h area off biological
bi l i l and
d
health sciences
Biostat-1 Hertanto
3
The purpose of a biostatistics
. to provide the numbers / tables / graphics
that contain information about a certain
situation
. to present them in such a way that valid
interpretations are possible
Biostat-1 Hertanto
4
Increasing role of biostatistics
• Biostatistics provides a way of organizing
information on a wider and more formal basis than
relying on the exchange of anecdotes and personal
experience
y in
• More thinggs are now measured qquantitatively
medicine
• There is a great deal ofintrinsic variation in most
bi l i l processes
biological
Biostat-1 Hertanto
5
To answer these q
questions,, we rely
y on
the methods of biostatistics
• Is the new drug effective + safe?
• Does ht e use off seat bbelt
l reduce
d
the
h chance
h
of death in motor vehicle accident?
• Where
h should
h ld government invest
i
its
i
resources if it wishes to reduce infant
mortality?
• etc
Biostat-1 Hertanto
6
Population
p
and samples
p
Except when a full cencus is taken,
taken the data are for a
sample from a larger group called the population.
The sample
Th
l is
i off interest
i
not in
i its
i own right,
i h but
b for
f
what it tells the investigators about the populations.
Because of chance, difference samples give different
results and this must be taken into account when
using
i a sample
l to
t make
k inferences
i f
about
b t the
th
populations. This phenomenon, called sampling
variation
variation.
Biostat-1 Hertanto
7
Sampling and representativity
Target Population
Sampling
Population
Sample
Target Population Î (Sampling Population) Î Sample
Biostat-1 Hertanto
8
Sampling
the process of selecting units from a population of interest
Sa m plin g M ode l
Biostat-1 Hertanto
9
Langkah-langkah penelitian
Fakta
Teori
Masalah
Tinjauan Pustaka
Generalisasi
- Identifikasi variabel
- Kerangka Teori
- Kerangka Konsep
Hi t i
Hipotesis
Verifikasi
- Desain
- Sampel
- Instrumen
- Pengumpulan Data
- Analisis
Biostat-1 Hertanto
Simpulan
10
Analysis
• By the time you get to the analysis of your
data, most of the really difficult works have
been done.
• It's much more difficult to :
define the research problem; develop and
i l
implement
a sampling
li plan;
l
conceptualize,
li
operationalize and test your measurements;
and develop
p a structure design.
g
• If you have done these works well, the
analysis of the data is usually a fairly
straightforward affair.
affair
Biostat-1 Hertanto
11
Data analysis involves three
major steps
• Cleaning and organizing the data for
analysis
l sis (Data
(D t Preparation)
P
ti )
• Describing the data (Descriptive
St tisti s)
Statistics)
• Testing Hypotheses and Models
(Inf
(Inferential
nti l St
Statistics)
tisti s)
Biostat-1 Hertanto
12
Data Preparation
•
•
•
•
•
logging the data
checking the data for accuracy
developing a database structure
entering
t i th
the data
d t into
i t the
th computer
t
transforming the data
Biostat-1 Hertanto
13
Logging the Data
In any research project data come from
a number of different sources at
d ff
different
times:
• mail surveys returns
• coded
d d interview
• laboratory
• etc
Biostat-1 Hertanto
14
Checking the Data For Accuracy
• As soon as data are received you should screen it
for accuracy.
accuracy
• In some circumstances, doing this right away
allows you to go back to the sample to clarify any
problems or errors.
• There are several questions you should ask as part
of this initial data screening:
9
9
9
9
Are the responses readable ?
Are all important questions answered?
A th
Are
the responses complete?
l t ?
Is all relevant contextual information included
(e.g.
g data, time, p
place, researcher)?
Biostat-1 Hertanto
15
Developing a database structure
• Defining variables
• Entering the data into the computer
• Data transformations
Biostat-1 Hertanto
16
Defining variables
•
•
•
•
•
•
•
variable name
variable description/label
value
l labels
l b l
missing values
variable
i bl ttype ((numeric,
i string,
t i
d
date
t etc)
t )
column format (width, alignment)
measurement level (N O I R).
R)
Biostat-1 Hertanto
17
Entering the Data into the
Computer
• There are a wide variety of ways to enter the
data into the computer for analysis.
• In order to assure a high level of data
accuracy, the analyst should use a procedure
y
called double entry.
Biostat-1 Hertanto
18
Data Transformations
•
•
•
•
Recode
Compute
p
Select cases
Rank cases
etc.
Biostat-1 Hertanto
19
Modification of data files
•Opening an existing data file
•Defining
D fi i new variables
i bl
•Entering new data
•Inserting and deleting cases and
variables
•Saving data files
Biostat-1 Hertanto
20
A l i
Analysis
• Descriptive
• Inferensial
Biostat-1 Hertanto
21
Descriptive Statistics
• are used to describe the basic features of the data
in a study.
• provide
id simple
i l summaries
i about
b
the
h sample
l andd
the measures.
• together with simple graphics analysis,
analysis they form
the basis of virtually every quantitative analysis of
data.
• with descriptive statistics you are simply
describing what is, what the data shows.
Biostat-1 Hertanto
22
Inferential Statistics
• investigate questions, models and
hypotheses.
f
statistics to try
y to
• we use inferential
infer from the sample data what the
population
p
p
thinks.
• Thus, we use inferential statistics to
make inferences from our data to more
general conditions.
Biostat-1 Hertanto
23
•
•
•
•
•
STATA
BMDP
EPIINFO
PEPI
SPSS
etc
Biostat-1 Hertanto
Softwares
24