speech, find out the students’ errors in reported speech, and figure out the possible causes of students’ errors. The test also had content validity as it covered all samples
of reported speech. Table 3.1 shows the blue print of the reported speech types in the test.
Table 3.1 The Blue Print of the Reported Speech Types in the Test No
Types Item Numbers
Total
1 Statement
1A, 2A, 3A, 21B, 22B
5 2
Question 4A, 5A, 6A, 7A
23B, 24B 6
3 Command
8A, 9A, 10A 25B, 26B, 27B
6 4
Request 11A, 12A, 13A
28B 4
5 Suggestion and Advice
14A, 15A, 16A 29B
4 6
Exclamation 17A, 18A, 19A, 20A
30B 5
30 The test used in this research contained all samples of the grammatical changes
of reported speech. A table of the grammatical changes of reported speech in the test is presented in the appendix as the evidence see appendix 5.
b. Criterion-Related Evidence of Validity
This validity refers to how far results on the test agree with those provided by some independent and highly dependable assessment of the students’ ability
Hughes, 1989. This validity emphasises on the criterion itself, not on the instrument. The criterion here can be the second test or other devices that also test
the students’ ability. Since there were no other devices or second test, which tested reported speech, this study could not show that the test fulfilled criterion-related
evidence of validity. 35
c. Construct Validity
The third category which a test must have is construct validity. A test has construct validity if it can be demonstrated that it measures the ability which is
supposed to measure Brown, 1987. The test constructed in this research was a kind of grammar test. This test had construct validity since it was able to measure the
fourth semester students’ ability in converting direct speech into reported speech.
d. Face Validity
According to Hughes 1989, a test fulfills face validity category if it looks as if it measures what it is supposed to measure. To make the test has face validity,
Heaton 1979 states that the test items look right to the other testers, teachers, lecturers, or other persons who know enough about the subject. Based on Heaton’s
explanation, the test constructed in this research had been consulted to a lecturer of Structure IV and two English instructors. They said that from its face, the test really
measured what it was intended to measure.
2. Reliability
Reliability deals with the consistency of the scores in measuring whatever it is intended to measure Fraenkel Wallen, 1993. The test is reliable if the same test is
given to the same subjects or matched subjects on two different occasions, the test itself should yield similar results Brown, 1987.
According to Fraenkel and Wallen 1993 there are three methods to measure the reliability of the test. They are test-retest method, equivalent-forms method, and
internal consistency method. To measure the reliability of the test constructed, this study used the split-half
procedure, which is included in the internal consistency method, since it requires 36