Role of Statistics in Research
n i s c ti s ti ta h S f rc Role of St at ist ics in research Validit y
Will t his st udy help answer t he research quest ion?
Analysis
What analysis, & how should t his be int erpret ed and report ed?
Efficiency
I s t he experim ent t he correct size, m aking best use of resources? Validit y
Will t he st udy answ er t he research quest ion?
Surveys
select a sam ple from a populat ion describe, but can’t explain can ident ify relat ionships, but can’t est ablish causalit y Surveys & Causalit y I n a survey: farm incom e increased by 10% for each increase in fert iliser of 30 kg/ ha I s t his relat ionship causal?
Surveys & Causalit y I n a survey: farm incom e increased by 10% for each increase in fert iliser of 30 kg/ ha
I s t his relat ionship causal? Not necessarily, ot her fact ors are involved:
Managerial abilit y Farm size Educat ional level of farm er
Fert iliser level m ay be relat ed t o t hese ot her possible
causes, and m ay ( or m ay not ) be a cause it selfSu r ve y Un it Ex a m ple : I n an survey t o assess
whet her Herefords have a higher level of calving difficult y t han Friesians, t he individual cow is t he survey unit .
Su r ve y Un it Ex a m ple : I n a survey t o assess t he
height of I rish m ales vs English m ales, t he unit is t he individual m ale in t hat one would sam ple a num ber of m ales of each count ry and t ake t heir height s rat her t han m easure one m ale from each count ry m any t im es.
ts
n
e
m
ri
e
p
x
E
d
e
n Com paring t reat m ent effect Effect = difference bet w een t reat m ent s
A well designed experim ent leads t o conclusion:
Eit her t he t reat m ent s have produced t he observed
effect orAn im probable ( chance < 1: 20, 1: 100 et c) event has occurred Technically we calculat e a p- value of t he dat a: i.e. t he probabilit y of obt aining an effect as large as t hat observed when in fact t he average effect is zero
Essent ial elem ent s of a designed
experim ent
1. COMPARATI VE The obj ect ive is t o com pare a
num ber ( > 1) of t reat m ent s
2. REPLI CATI ON
Each t reat m ent is t est ed on m ore t han one experim ent al unit
3. RANDOMI SATI ON
experim ent al unit s are allocat ed t o t reat m ent s at random Replicat ion Each t reat m ent is t est ed on m ore t han one e x pe r im e n t a l u n it ( t he populat ion it em t hat receives t he t reat m ent ) To com pare t reat m ent s we need t o know t he inherent variabilit y of unit s receiving t he sam e t reat m ent background noise t his m ight be a sufficient explanat ion for t he observed differences bet w een t reat m ent s
Replicat ion: 2 fact s Our fait h in t reat m ent m eans will: I ncrease wit h great er replicat ion Decrease when noise increases
I n part icular t he st andard error of difference ( SED) bet w een 2
t reat m ent m eans where: r = ( com m on) replicat ion; s = t ypical difference bet ween observat ions from sam e t reat m ent :SED is t he t ypical difference bet ween 2 t reat m ent m eans
where t he t reat m ent s don’t differ
V a lidit y & Efficie n cy
V a lidit y: The first requirem ent of an
experim ent is t hat it be valid. Ot herw ise it is at best a w ast e of t im e and resources and at w orst it is m isleading.
Efficie n cy: t he use of experim ent al
resources t o get t he m ost precise answ er t o t he quest ion being asked, is not an absolut e requirem ent but is cert ainly desirable because cost is an im port ant aspect of any experim ent . Treat ing m ult iple m easurem ent s on t he sam e unit as if t hey were m easurem ent s on independent unit s Ex a m ple : I n an experim ent t est ing t he effect of a horm one t reat m ent on follicle developm ent , t he cow is t he experim ent al unit , not t he follicle.
Pseudoreplicat ion
- how t o invalidat e your experim ent !
Ex a m ple :
I n an experim ent t o com pare t hree cult ivars of grass, a rect angular t ray w as assigned at random t o each t reat m ent . Trays w ere filled w it h John I nnes Num ber 2 com post and 54 seedlings of t he appropriat e cult ivar w ere plant ed in a rect angular pat t ern in each t ray.
Aft er t en w eeks t he 28 cent ral plant s w ere harvest ed, dried and w eighed and t he 84 plant w eight s recorded. What w as t he experim ent al unit ?
Ex a m ple :
I n an experim ent t o com pare t hree cult ivars of grass, 7 square pot s w ere assigned at random t o each t reat m ent . Pot s w ere filled w it h John I nnes num ber 2 com post and 16 seedlings of t he appropriat e cult ivar plant ed in a square pat t ern in each pot .
Aft er t en w eeks t he 4 cent ral plant s w ere
harvest ed, dried and w eighed. Thus 84 plant w eight s w ere recorded. What is t he experim ent al unit and w hat should be analysed?
Random isat ion
- allocat ing t reat m ent s t o unit s
Ensures t he only syst em at ic force working on experim ent al unit s is t hat produced by t he t reat m ent s
All ot her fact or t hat m ight affect t he out com e are random ly allocat ed across t he t reat m ent s
Ra n dom isa t ion - h ow it w or k s
What do we m ean by ‘I n a random ised experim ent any difference bet w een t he m ean response on different t reat m ent s is due t o t reat m ent difference or random variat ion or bot h’?
Example: Suppose 8 experimental units, allocated at random to two treatments.
Unit 1 2 3 4 5 6 7 8
Response if treated the same 4.1 5.3 7.2 2.6 3.5 6.4 5.5 4.7 Allocated at random to treatmentT2 T2 T2 T2 T1 T1 T1 T1 Treatment effect
2 2 2
2 0 0
Experimental response
9.2 4.6 5.5
7.5 4.1 5.3
6.4
4.7 Mean response T1
5.13 T2
6.70 The estimated treatment effect is the difference 6.70 - 5.13 = 1.57 between these two means. It is partly influenced by the treatment effect (2 units) and partly by the variation between experimental units, the Now suppose the most extreme allocation, with the poorest experimental units receiving T2.
Unit 1 2 3 4 5 6 7 8
Response if treated the same 4.1 5.3 7.2 2.6 3.5 6.4 5.5 4.7 Allocated at random to treatmentT2 T2 T2 T2 T1 T1 T1 T1 Treatment effect
2 2 2
2 0 0 0 0
Experimental response 6.1 4.6 5.5
6.7 5.3 7.2 6.4 5.5
Mean response T1
6.10 T2
5.73 The estimated treatment effect is 5.73 - 6.10 = -0.37.
Again it is partly influenced by the treatment effect (+2) and partly by the variation between experimental units, the background noise. The treatment effect is swamped by the extreme allocation. Again consider the same extreme allocation but with a larger treatment effect.
Unit 1 2 3 4 5 6 7 8
Response if treated the same 4.1 5.3 7.2 2.6 3.5 6.4 5.5 4.7 Allocated at random to treatmentT2 T2 T2 T2 T1 T1 T1 T1 Treatment effect
10 10 10
10 0 0 0 0
Experimental response
14.1
12.6
13.5
14.7 5.3 7.2 6.4 5.5
Mean response T1
6.10 T2
13.73 The estimated treatment effect is the difference
Th r e e poin t s: The observed t reat m ent difference is due only t o t reat m ent effect and variat ion.
I f t he t reat m ent effect is large relat ive t o t he background noise t hen even an ext rem e allocat ion will not obscure t he t reat m ent effect . ( Signal/ Noise rat io) .
I f t he num ber of experim ent al unit s is large t hen a t reat m ent effect will usually be m ore obvious, since an ext rem e allocat ion of experim ent al unit s is less likely.
Wit h 20 experim ent al unit s, unlikely t hat t he 10 worst and t he 10 best allocat ed t o different t reat m ent s.
Te st s of H ypot h e se s - Te st s of Sign ifica n ce
Su r ve y: Are t he observed differences bet w een
groups com pat ible w it h a view t hat t here are no differences bet w een t he populat ions from w hich t he sam ples of values are draw n?
D e sign e d e x pe r im e n t s: Are observed
differences bet w een t reat m ent m eans com pat ible w it h a view t hat t here are no differences bet w een t reat m ent s?
Te st s of H ypot h e se s - Te st s
of Sign ifica n ce D e sign e d e x pe r im e n t - only t woexplanat ions for a negat ive answer, difference is due t o t he applied t reat m ent s or a chance effect
Su r ve y is silent in dist inguishing
bet ween various possible causes for t he difference, m erely not ing t hat it exist s.
Ex a m ple
An experim ent on art ificially raised salm on com pared t wo t reat m ent s and 20 fish per t reat m ent . Average gains ( g) over t he experim ent al period w ere 1210 and 1320. Variat ion bet w een fish wit hin a group was RSE = 135g
Did t reat m ent im prove grow t h rat e?
b) M e a su r e
Pr oce du r e
a ) N ULL H YPOTH ESI S Treat m ent s have no effect and
any difference observed bet w een groups t reat ed
different ly is due t o chance ( variat ion in t he experim ent al m at erial) '- t he variat ion bet w een groups t reat ed different ly
- t he variat ion expect ed if due solely t o chance
c) TEST STATI STI C Com pare t he t w o m easures of
variat ion. Do t reat m ent s produce a 'large' effect ?
d) The observed difference could have occurred by
chance. St a t ist ica l t h e or y give s r u le s t o
de t e r m in e h ow lik e ly a give n diffe r e n ce in
va r ia t ion is lia ble t o be by ch a n ce . e ) SI GN I FI CAN CE TEST Face t he choice.- This difference in variat ion could have occurred by
chance w it h probabilit y ? ( 5% , 1% , et c)
OR
- There is a real difference ( produced by t reat m ent ) .
f) GOOD EX PERI M EN TAL PROCED URE m akes
sure in experim ent s t hat t here is no ot her possible explanat ion.
Ex a m ple : - Th e t t e st
An experim ent on art ificially raised salm on com pared t wo t reat m ent s and 20 fish per t reat m ent . Average gains ( g) over t he experim ent al period w ere 1210 and 1320. Variat ion bet w een fish wit hin a group was RSE = 135g
Did t reat m ent im prove grow t h rat e? Exam ple
a ) N ULL H YPOTH ESI S - Treat m ent does not affect salm on
growt h rat eb) Observed difference bet ween groups 1320 - 1210 = 110 Variat ion expect ed solely from chance 135 x ( 2/ 20) .5 = 42.7
c) Te st St a t ist ic t = 110/ 42.7 = 2.58
d) St at ist ical t heory ( t t ables) shows t hat t he chance of a value as large as 2.58 is about 1 in 100 e ) Make t he choice
f) Are t here ot her possible explanat ions?
Responsibility of the Researcher and
PLANNING PHASE
Statistician;
Researcher Statistician
Seek statistical training Keep up to date with statistical
technology Seek statistical advice Teach principles
Use minimum experiment size Provide statistical input to plan
Select experimental material Give researcher different properly alternatives Develop detailed research planExercise proper protocols for human or animal experiments
EXECUTION PHASE
Research Statistician Carry out study as planned Give road map/for execution
Log important dates related Point out weak links in
to data research chainANALYSIS PHASE
Researcher Statistician
Study data patterns Assist in studying data patterns
Keep integrity of data set Choose proper statistical analytical procedure Assist in choosing analytical procedure Choose probability levels, contrasts to make, etc.Assist in choosing probability levels, contrasts, etc.
Avoid result guided procedures Keep data, but not statistical methods used, confidential
INTERPRETATION AND REPORTING
Researcher Statistician
Provide description of Assist in writing statistical
statistical methods methods used Present results in such a Review interpretation. way that reader can evaluate Modification if necessary the interpretation Report variability estimates such an standard deviation or variance
Responsibility of Researcher and
Statistician Promote high standards of scientific inquiry and professionalism
Involve appropriate techniques for research Honor the rights of other researchers – give credit to other researcher where due Consider interdependence of natural, social and technological systems
Give objectivity a major role Guard against misinterpretation and misuse of data
Good Practices Checklist
Planning is very important in experimentation
Statistician can assist in planning Planning does not ensure success but avoids
built-in disasters
Statistics cannot compensate for negative impacts of persisting in a faulty line of research
Good Planning Can Prevent:
Costly waste of resources Difficult statistical analysis Data for which interpretation is controversial
An experiment which is precise but which answers the wrong questions Setting Up Original Hypothesis Objectively
2 Rules:
1. Hypothesis should be clearly related to original problem
2. The hypothesis should be stated as simply as possible
IV. Discipline Specific Ethical Issues
Flexibility needed: Ethics vary Among Different Application Areas Business Application: Withholding Negative Results Problem Formulation Important – Involve Statistician
Design Considerations: Costs, Definition of Population,
Sampling Frame Statistician provides report but does not make decisions for management Company should have same responsibilities to a salaried statistician as to a consulting one (and conversely).
See: Deming, W.W., Sample Designs in Business Research. Wiley NY 1960.
Medical Application
Medical review boards Informed consent Methods of selecting subjects
Withholding a treatment to a control group Access to data
Confidentiality of identity of subjects
V. Ethical Issues in Interpretation and
Reporting
Insufficient statistical methods description Statistical significance vs. practical significance
Access to data
Kinds of means in the factorial experiment reporting
Reporting of measures of dispersion Proper decimal reporting Bonafide scientific conclusions vs. speculation Clarity of reporting
Indication that results are not final word Enumeration of new study questions
VI. Case Studies
Skagerrak Case – Precautionary Principle
2 Highly respected scientists interpret their results differently Case emphasized 2 critical aspects of research
1. The actual statistical analysis
2. How and when to disseminate the information from research
Elton’s Withholding of Anomalous Data US Census: Use of sample survey methods to adjust census counts