Research Question 1 DATA ACQUISITION

68 up the tasks, while the other acts as an assessor and does not join the conversation. The overall test format was organized into three parts. The first two parts were developed from the responsive speaking task suggested by Brown 2004. In the first task, each student interacted with the interlocutor. The interlocutor asked the students some guided questions. The other student not answering question was asked to pay attention to what the performing student was speaking. Then, for the second task, each student was asked to paraphrase the information given by the other student who had performed in advance. Meanwhile, the third task was developed using the interactive speaking task Brown, 2004. In the third task, students were to interact with each other. The interlocutor set up the activity using a standardized rubric developed based on particular topics taken from the course material. During and at the end of the test, each of the examiners gave marks. Finally, the scores between the two examiners were compared and averaged. The reliability of this test was very important as according to Gall, et al. 2003, p.196, the reliability of a test refers to the measurement error present in the scores yielded from the test. Since there were two examiners giving marks for every student in each test, it was necessary to make sure that the testing instruments and procedures were valid and reliable. There was a possibility that the two examiners gave inconsistent marks or measurement error even for the same student. In this case, a reliable test should yield stable and consistent scores whenever it is administered Creswell, 2011 and whoever the test examiners are. In order to show the test’s reliability, therefore, an inter-rater reliability test was conducted to negate any bias that any individual bring to the scoring. This inter- 69 rater reliability test should be conducted by “having several testers administer the test to a sample of individuals and then correlating their obtained scores with each ot her” Gall, et al., 2003, p.198 using SPSS. The inter-rater reliability test was conducted through examining the intraclass correlation coefficient between teacher 1’s scores and teacher 2’s scores given to participants during the pre-test. The results of this statistical examination is presented below. Table 3.2 Intraclass correlation coefficient Intraclass Correlation b 95 Confidence Interval F Test with True Value 0 Lower Bound Upper Bound Value df1 df2 Sig Single Measures ,761 a ,088 ,945 12,537 8 8 ,001 Average Measures ,864 c ,162 ,972 12,537 8 8 ,001 Based on the table, it is indicated that the scores given by teacher 1 and teacher 2 were correlated as much as 0.86, which was greater than 0.7. This signified that they were highly correlated. In addition, the significance level was 0.001 p 0.05, which indicated that the inter-reliability was high. Based on this fact, the test scales were proven highly reliable, so that it could be utilized throughout the research.

b. Pre- and post- program questionnaires

Questionnaires composed of the same structure were distributed twice: at the beginning and the end of the course program. This aimed to see the improvement and changes of students’ collaboration skill, which was a part of the answer to the second research question: how effective was differentiated PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI