42
particular uses for which they are intended. If the results are to be used to describe pupil achievement, we should like them to represent all aspects of the
achievement we wish to describe, and to represent nothing else.
66
T he writer adopts Anderson, Herr, and Nihlen’s criteria regarding validity
in action research, that mention the validity of action research including democratic validity, outcome validity, process validity, catalytic validity, and
dialogic validity.
67
In this study, the writer uses outcome, process, and dialogic validity. Anderson defines outcome validity requires the action emerging from a
particular study leads to the successful resolution of the problem that was being studied, that is, your study can be considered valid of you learn something that
can be applied to the subsequent research cycle.
68
Based on the explanation above, the outcome validity could be seen from the result of the test. When the result of cycle two is better than cycle one, it
means that the study is successful. Then, Process validity is “the validity that requires a study has
been conducted in a “dependable” and “competent” manner.”
69
It could be seen from the outcome of observation. In this case, the collaborator notes all events happening during the CAR. When there might have
some mistakes in the method of teaching, then the writer discusses with the teacher to modify the further strategies. Next, the dialogic validity; “it involves
having a critical conversation with peers about research findings and practices.”
70
In this case, the writer and the teacher discuss and assess the students’ test result
of cycle one and cycle two together. It is done in order to avoid invalid data.
2. Discriminating Power
71
The discriminating power of an achievement test item refers to the degree of which it discriminates between pupils with high and low achievement. To
66
Norman E. Gronlund, Measurement and Evaluation in Teaching New York: Macmillan, 1981, 4
th
edition, p. 65.
67
Geoffrey E. Mills, Action Research: A Guide for the Teacher Researcher, Columbus: Merrill Prentice Hall, 2003, p. 84.
68
Geoffrey E. Mills, Action Research: A Guide for th e Teacher Researcher, … p. 84.
69
Geoffrey E. Mills, Action Research: A Guide for the Teacher Researcher, … p. 84.
70
Geoffrey E. Mills, Action Research: A Guide for the Teacher Researcher , … p. 85.
71
See Appendix 11. pp.152-157.
43
analyze it, the writer obtains the formula:
72
� = � − �
D = Discriminating Power
R
u
= number of pupils in the upper group who got the item right R
l
= number of pupils in the lower group who got the item right T
= number of pupils included in the item analysis. Next, the discriminating scale uses:
73
DP REMARK
≤ 0.40 Used
0.20 – 0.39
Revised ≤ 0.10
Discarded
3. Reliability
74
Reliability defines whether an instrument can measure something to be measured constantly from time to time. To know the reliability of test instrument
of student achievement, Kuder-Richardson K-R 20 formula is used:
75
= �
� − − �
r
xx
= reliability KR-20 k
= number of item on the test S
= standard deviation of score test p
= proportion of pupil answering each item correctly q
= proportion of pupil answering each item wrong q= 1 – p
72
Norman E. Gronlund, Measurement and Evaluation in Teaching New York: Macmillan, 1981, 4
th
edition, p. 259.
73
J. B. Heaton, Classroom Testing, New York: Longman Inc, 1990, p. 174.
74
See Appendix 11, pp.158-160.
75
Suharsimi Arikunto, Dasar-Dasar Evaluasi Pendidikan, Jakarta: Bumi Aksara, 2001, p.101.
44
The criterion:
76
r
xx
= 0,91 – 1,00 = very high
r
xx
= 0,71 – 0,90 = high
r
xx
= 0,41 – 0,70 = enough
r
xx
= 0,21 – 0,40 = low
r
xx
= 0,21 = very low
4. Item Difficulty
77
A good test item is a test item which is whether not too easy or too difficult. The item difficulty from the analysis data is based on the upper and
lower group. The difficulty of a test item is indicated by the percentage of pupil who gets the item right. The difficulty power following the formula:
78
� = �
P = Item Difficulty
R = the number of pupil who got the item right
T = the total number of pupil who tried the item
The criterion that is used is as:
79
ID REMARK
– 0.14 Difficult
0.15 – 0.85
Moderate 0.86
– 1.00 Easy
76
Suharsimi Arikunto, Dasar-Dasar Evaluasi Pendidikan, Jakarta: Bumi Aksara, 2001, p.102.
77
See Appendix 11, pp.152-157.
78
Norman E. Gronlund, Measurement and Evaluation in Teaching New York: Macmillan, 1981, 4
th
edition, p. 258.
79
John W. Oller, Language Test at School London: Longman Group Limited, 1979, p. 247
45
J. Criterion of the Classroom Action Research Success