39 examinees who take the test, a degree of the administrative personnel who
evaluate the test’s use, and a degree of some other observers. In order to improve learning quality, the test’s appearance should be useful and relevant to measure
what to be measured and is based on the agreement among the test-takers, administrative personnel and other observers. In this case, the researcher had done
some consultations to some lecturer of English Language Education Study program in order to construct the test with right face validity.
2. Test Reliability
Reliability refers to the consistency of test or other measurement instruments on each time the measurement instrument is conducted. “A reliable
test is consistent and dependable Brown 2004: 20.” Bertrand and Cebula 1980: 117 defines reliability is the degree of knowledge or cognitive ability
measurement consistency, whether it is consistent or not on each time the test is given. Therefore, if the test produces the same results in each test conducted, it
means the test is highly reliable. On the other hand, if the test produces different results on each time the test is conducted, the test tends to only have low
reliability or even the test is unreliable. In order to know whether the test the researcher constructed was reliable
or not, the researcher conducted try-out before conducting the test to the real sample. The try-out was held on November 30, 2009 in Poetry I class. The
participants were 20 students. This try-out had 20 items See Appendix 1 that the students had to do. After conducting this try-out, the researcher scored the
40 students’ works, and then computed the test reliability by using split-halves
method. Before computing the test reliability, the researcher might compute the test reliability coefficient. The formula used to compute the reliability coefficient
would be discussed later. The result of try-out test reliability was 0.86, while the required reliability of test using 20 persons of sample with the level of
significance 5 was 0.444. The list of the students’ scores and the computation is in Appendix 9.
Based on the results of computing try-out test reliability, the researcher revised and improved the test in order to make it better so that the test was reliable
to conduct to the real sample. In this case, the researcher changed some words in several items in order to avoid ambiguity. Also, she revised the instruction in
order to make it clearer. Therefore the students could do the test without any hesitation. The changes are in Appendix 3.
To compute the reliability coefficient, the researcher needed to provide two sets of scores. In this case, there are three common methods to compute the
reliability coefficient. These are test retest reliability, equivalent forms reliability, and split-halves reliability. In this research, the researcher chose split-halves
reliability to provide two sets of scores in order to measure the reliability coefficient. In this method, to provide two sets of scores, the test was divided into
two halves that were the same and that had the same difficulties. The students should answer each item in the same way.
Although the test contained two halves, the researcher conducted the test only once. Nonetheless, she scored the test on each half so that the students, in
41 that test, got two scores. One was the first half, referring to odd numbered items,
which was labeled X and the other was the second half, referring to even numbered items, which was labeled Y. The test the researcher conducted
contained 20 essay items where the students should combine and construct two simple sentences into one complex sentence using a relative clause by considering
whether it included a defining or non-defining relative clause. The researcher used split-halves method to compute the test reliability
coefficient. In this case, because the test consisted of two halves, the researcher used Pearson Product-Moment Correlation Coefficient Alderson and Bachman,
2004: 86 as the formula to compute the correlation coefficient of the scores of two halves test.
Where: r
xy
: the correlation coefficient of the scores of the two halves of the test. N
: the number of the students in the sample ΣX : the sum of the X scores
ΣY : the sum of the Y scores ΣX
2
: the sum of the square of the X scores ΣY
2
: the sum of the square of the Y scores ΣXY : the sum of the product of X and Y scores for each student
Because the result of the computation was only the half of the test, the researcher needed to do a measurement for the entire test. That was why the researcher used
42 Spearman-Brown formula to measure the reliability of the entire test Ebel, 1979:
2780.
Where: r
2
: the obtained reliability coefficient of the entire test r
s
: the obtained reliability coefficient of a half of the test.
The result of test reliability coefficient was 0.81 See Appendix 10. It meant that the correlation between X and Y was very significant. Based on the
table of “r” values of Product Moment See Appendix 8, the required reliability coefficient of test using 40 persons of sample with 5 is 0.312. Although the
reliability coefficient of the real test is lower than the reliability coefficient of the try-out test, it is still reliable because the coefficient was much higher than the
required reliability coefficient. The researcher had revised and improved on the test but the students did
not do the test seriously. That is why the result of reliability coefficient of the test is unfortunately lower than the result of reliability coefficient of try-out test.
However, the test the researcher conducted is still highly reliable because the test’s reliability coefficient is much higher than the required reliability coefficient.
43
D. Data Gathering Technique
The researcher gathered the data by testing the students. She conducted the test in Academic Essay class A on Wednesday, December 2, 2009 and Academic
Essay class E on Friday, December 11, 2009. First, the researcher introduced herself and also explained the aim of the research. Then she stated that she needed
the students’ help to do the test. In this case, the students did not know the plan of this test, so there was no time for them to prepare for the test. As a result, the
researcher could recognize mastery of defining and non-defining relative clauses by students of the English Language Education Study Program without any
preparation for the test. Second, the researcher distributed the test to each of the students, although
some of them were the shoppers. Then, the researcher explained clearly the instruction of the test in order to make sure that all the students really understood
what they should do in the test. Before starting to conduct the test, the research told the students about the limitation of time. Therefore, the students could
manage their time in doing the test. The researcher might also remind the students ten minutes before the time was up. After finishing the test, the students might
collect the test to the researcher. In order to get the valid data, the researcher separated the shoppers’ work from the fifth semester students. Finally, the
researcher scored the fifth semester students’ work. Each student’s score became the data to be analyzed.