Test Types of Instruments

33 were also important. In order for the research results to be valid and reliable, it should have a valid and reliable instrument. The following is the description of the instruments and of their validity and reliability.

1. Types of Instruments

In this research, the researcher used a test and an interview as instruments to identify the errors and the mastery of the participants.

a. Test

The test was used as an instrument to collect the data related to the errors in the use of the English phrasal verbs from the research participants. The test was constructed in the form of production and recognition types. The test consisted of three parts, and each part had its own objective. The first part of the test was production test. The next two parts of the test were recognition tests. The first part of the test was twenty 20 items of gap filling. The objective of this test was that the participants were able to place the right particles for the main verbs in the right gaps to create meaningful sentences. This part was defined as a production test because it did not provide any choices to assist the participants to do the test by guessing only. In fact, those items required the participants to be able to demonstrate their ability to use the English phrasal verbs. On the other hand, the second and the third part of the tests were recognition tests. The second part of the test was ten 10 items of gap filling. The participants were asked to place the direct objects, in the parentheses, in the appropriate gaps in each sentence to create meaningful sentences. The objective of these items was the participants were able to distinguish the one of the rules 34 underlying the separable and the inseparable phrasal verbs. Then, the third part of the test was ten 10 items of matching. The objective of the matching items was that the participants were able to recognize the meaning of some phrasal verbs by giving them some choices. These parts were defined as recognition tests since the researcher provided some choices to assist the participants to do the test. As it was explained before, the researcher constructed forty 40 items for the test. The distribution of items for each type of phrasal verbs were thirteen 13 items for the first type of the separable phrasal verbs namely optional phrasal verbs, fifteen 15 items for the second type of the separable phrasal verbs namely obligatory phrasal verbs and twelve 12 items for the inseparable phrasal verbs. Then, most of the items in the test were taken from the trustworthy English grammar textbooks and the researcher, referring to those textbooks as well, constructed a few of them. According to Fraenkel Wallen 2009: 147, “the quality of the instruments used in research is very important, for the conclusions researcher draw are based on the information they obtain using these instruments.” Due to its essential role in this research, the test as the instrument employed by the researcher should meet the requirements of measurement. These requirements are validity and reliability. According to Ary et al. 2002: 242, being valid and reliable are two essential criteria of the quality of any measuring instrument should posses because if a researcher’s data are not obtained with valid and reliable instruments, one would have little faith in the results obtained. In the following, the researcher elaborates the validity and reliability of the test. 35 1 The Validity of the Test Validity, according to Lado 1977: 321, is essentially a matter of relevance. “Is the test relevant to what it claims to measure?” Lado, 1977: 321. In other words, “a test is said to be valid if it measures what it is intended to measure” Heaton 1975: 153. There are four ways to consider the validity of a test. Fraenken Wallen 2009: 148 defines those ways as types of evidence to support the interpretations researcher wish to make concerning data they have collected. “They are “content validity, construct validity, criterion-related validity and face validity” Hughes, 1989: 22-27. However, in this research, the researcher did not explain the criterion-related validity of the test since the researcher was not going to compare the participants’ performance on the test with their performance on some other independent criterion as explained by Fraenkel Wallen 2009: 148. The researcher was only going to administer the test to measure the participants’ mastery. a Content Validity “A test is said to have content validity if its content constitutes a representative sample of the language skills, structure, etc. with which is meant to be concerned” Hughes, 1989: 22. Then, Hughes 1989: 22 states also that the test would have content validity only if it included a proper sample of the relevant structures. Just what are the relevant structures will depend, of course, upon the purpose of the test. Furthermore, according to Hughes 1989: 22, in order to judge whether or not a test has content validity, the researcher needs a 36 specification of the skills or structure that is meant to be covered. In this research, the specification is the participants’ errors in using the English phrasal verbs in order to know the mastery of the participants of the English phrasal verbs. Therefore, the test should test those errors. Bachman 1995: 244-245 divides the content validity into two parts. The first is content relevance, which means that the content of the test should be relevant to the purpose of the test. Then, the second is content coverage which means that the test should cover all or adequately sample elements of the language to be tested. Thus, based on Bachman’s 1995: 244-245 explanation, the test in this research was said to have the content validity since, first, its content was relevant to the purpose of the test, which was to measure the participants’ mastery of the English phrasal verbs, especially the separable and the inseparable phrasal verbs. It could be seen from the three parts of the test which required the participants to demonstrate their mastery of the separable and the inseparable phrasal verbs by placing the particles and the direct objects in the appropriate gaps. Furthermore, in this research, English phrasal verb was also relevant to the sixth semester students’ ability, as the participants. Second, the test covered types of phrasal verbs which had been limited into two types of the English phrasal verbs, the separable and the inseparable phrasal verbs, and adequately sampled the uses of the English phrasal verbs. b Construct Validity “A test is said to have construct validity if it can be demonstrated that it measures just the ability which it is supposed to measure” Hughes, 1989: 26. In 37 this research, the test was aimed to measure the participants’ mastery of the English phrasal verbs. “As one part of the evidence of the validity of the test, there are three steps involved in obtaining this type of evidence” Fraenkel Wallen, 2009: 153. Those three steps are: 1 The variable being measured is clearly defined. The variable being measured, in this research, was the sixth semester students’ mastery of the English phrasal verbs, especially the separable and the inseparable phrasal verbs. The participants who have mastered the separable and the inseparable phrasal verbs were those who have better knowledge of the forms and the rules of using the separable and the inseparable phrasal verbs, the meaning of those forms, and how to use those forms correctly than those who have not mastered the separable and the inseparable phrasal verbs. Therefore, the participants who have mastered the separable and the inseparable phrasal verbs would barely make errors in using them. 2 The hypothesis is formed based on a theory underlying the variable. Based on the theory underlying the variable above, the researcher hypothesized that the participants who score high on his mastery are those who can achieve better scores and make fewer errors on the test than the participants who score low on the test because they have better knowledge of forms and the rules of using the separable and the inseparable phrasal verbs, the meaning of those forms and how to use those forms correctly. 38 3 The hypothesis is tested. The hypothesis above was then tested through administering the test to the participants. This means that the participants were given the test to measure their mastery of the English phrasal verbs, especially the separable and the inseparable phrasal verbs. If the test had content validity, more of the participants who have mastered than those who have not would achieve higher score and make fewer errors on the test. This is one form of evidence that could be used to support inferences about the students’ mastery of the English phrasal verbs, based on the scores they receive and the number of errors they make on their test. In this research, since the test was constructed to measure the students’ mastery, the test should emphasize the production of the separable and the inseparable phrasal verbs rather than recognition of the separable and the inseparable phrasal verbs. Accordingly, this test had more production items than the recognition. If the participants received a high score in the test, it means they could produce the phrasal verbs well. Thus, they were said to have mastered the phrasal verbs. Since the test really measured the students’ ability in producing the separable and the inseparable phrasal verbs, it could be concluded that the test had construct validity. c Face Validity “A test is said to have face validity if it looks as if it measures what it is supposed to measure” Hughes, 1989: 27. Hughes 1989: 27 also mentiones that a test which does not have face validity may not be accepted by candidates, teachers, education authorities, or employers. Hence, as stated by Lado 1977: 39 321, although this validity is weak and unscientific, it is still important as the evidence of the validity of the test. 2 The Reliability of the Test “Reliability refers to the consistency of the scores obtained–how consistent they are for each individual from one administration of an instrument to another and from one set items to another” Fraenkel Wallen, 2009: 154. This statement is approved by Brown 1991: 98 who states that the reliability of a test is defined as the extent to which the results can be considered consistent or stable. In fact, in a case mentioned by Hughes 1989: 29 seemed to imply that researchers can never have complete trust in any set of test scores because the researchers know that the scores would have been different if the test had been administered on different days. Therefore, according to Hughes 1989: 29, the researchers have to make, administer and score tests in such a way that the scores actually gained on a test on a particular occasion are likely to be similar to those which would have been gained if it had been administered to the same students with the same ability, but at a different time. “The more similar the scores would have been, the more reliable the test is said to be” Hughes, 1989: 29. A test must be reliable. It can be measured, as mentioned by Best 1986: 154, with a reliability coefficient. The ideal reliability coefficient is one 1. “A test with a reliability coefficient of one 1 will give precisely the same results for a particular set of candidates regardless of when it happens to be administered” Hughes, 1989: 30. While, according to Hughes 1989: 31, a test that has a reliability coefficient of zero 0 will give sets of results quite unconnected with 40 each other. “It means that the score that someone actually obtains on Wednesday will be no help at all in attempting to predict the score he or she will obtain if she or he takes the test the day after” Hughes, 1989: 31. According to Best 1986: 154 and Fraenkel Wallen 2009: 155, “there are number of types of ways to obtain the reliability coefficient such as test-retest, equivalent or parallel forms, internal consistency, inner-scorer reliability, and standard error measurement.” In this research the researcher used the Kuder- Richardson formula 20 K-R 20, which is included in inter-item consistency method Fraenkel Wallen, 2009: 156, to measure the reliability coefficient of the test. The researcher employed this method because it only requires a single administration of the test. Then, according to Ary et al. 2002: 258, “K-R 20 formula can be applied to tests in which the items are scored either correct or incorrect.” Therefore, it could be concluded that K-R 20 formula is suitable for the test employed in this research. The K-R 20 formula as presented by Fraenkel Wallen 2009: 277 is as follow. where: = reliability of the whole test = number of items on the test = variance of scores on the total test squared standard deviation = proportion of correct responses on a single item = proportion of incorrect responses on the same item 41 Then, Best 1983: 255 also classifies the value of the reliability coefficient. The classification is presented in Table 3.1. Table 3.1 The Classification of the Reliability Coefficient COEFFICIENT r RELATIONSHIP 0.00 to 0.20 Negligible 0.20 to 0.40 Low 0.40 to 0.60 Moderate 0.60 to 0.80 Substantial 0.80 to 1.00 High to very high The result of the pilot test, which was administered to the students in Translation II class A, showed that the reliability coefficient was 0.72. Therefore, the test could be said to have substantial reliability.

b. Interview