Research Question 1 DATA ACQUISITION
68 up the tasks, while the other acts as an assessor and does not join the conversation.
The overall test format was organized into three parts. The first two parts were developed from the responsive speaking task suggested by Brown 2004. In the
first task, each student interacted with the interlocutor. The interlocutor asked the students some guided questions. The other student not answering question was
asked to pay attention to what the performing student was speaking. Then, for the second task, each student was asked to paraphrase the information given by the
other student who had performed in advance. Meanwhile, the third task was developed using the interactive speaking task Brown, 2004. In the third task,
students were to interact with each other. The interlocutor set up the activity using a standardized rubric developed based on particular topics taken from the course
material. During and at the end of the test, each of the examiners gave marks. Finally, the scores between the two examiners were compared and averaged.
The reliability of this test was very important as according to Gall, et al. 2003, p.196, the reliability of a test refers to the measurement error present in
the scores yielded from the test. Since there were two examiners giving marks for every student in each test, it was necessary to make sure that the testing
instruments and procedures were valid and reliable. There was a possibility that the two examiners gave inconsistent marks or measurement error even for the
same student. In this case, a reliable test should yield stable and consistent scores whenever it is administered Creswell, 2011 and whoever the test examiners are.
In order to show the test’s reliability, therefore, an inter-rater reliability test was conducted to negate any bias that any individual bring to the scoring. This inter-
69 rater reliability test should be conducted by “having several testers administer the
test to a sample of individuals and then correlating their obtained scores with each ot
her” Gall, et al., 2003, p.198 using SPSS. The inter-rater reliability test was conducted through examining the
intraclass correlation coefficient between teacher 1’s scores and teacher 2’s scores given to participants during the pre-test. The results of this statistical examination
is presented below.
Table 3.2 Intraclass correlation coefficient
Intraclass Correlation
b
95 Confidence Interval F Test with True Value 0
Lower Bound Upper Bound Value
df1 df2
Sig Single Measures
,761
a
,088 ,945
12,537 8
8 ,001
Average Measures ,864
c
,162 ,972
12,537 8
8 ,001
Based on the table, it is indicated that the scores given by teacher 1 and teacher 2 were correlated as much as 0.86, which was greater than 0.7. This
signified that they were highly correlated. In addition, the significance level was 0.001 p 0.05, which indicated that the inter-reliability was high. Based on this
fact, the test scales were proven highly reliable, so that it could be utilized throughout the research.