Reliability of the Test

24 the norm-referenced tests NRT, Brown 1996: 2 declared that NRT is designed to measure general language abilities for illustration, generally English language proficiency, academic listening skill, reading comprehension, and many others. Each student’s score then is interpreted relative to the other students’ scores who had the same test. Hence, it would be such a comparison between a student’s score with other students’. Brown added, the goal of an NRT is to spread students out along a continuum of scores so that those with “low” abilities in general domain such as listening, are at one end of the normal distribution, while those with “high” abilities are at the other end. Besides, the students may know the format of the test, such as multiple-choice, dictation, essay or true-false, but they will not recognize what exact content or skills will be tested by those questions. In contrast, Brown 1996 added that a criterion-referenced test CRT is composed to measure well-defined and fairly specific objectives. It merely interprets the students’ scores. Each student’s score is meaningful without reference to other students’ scores. This statement means that a student’s score indicates the knowledge or skill in that objective the student has learned. If all students know 90 of the material on all objectives, all of them should acquire the same results without any variation at all. The aim of a CRT is to measure the quantity of learning that a student has achieved on each objective. In most cases, the students will know in advance what types of questions, tasks, and content to expect for each objective because the question content will be explicitly stated in the objective of the course. 25 Based on those two categories, it can be concluded that the weekly test fits in the CRT. First, the weekly test measures the specific objectives as stated in the course syllabus. Second, for the scoring, each student’s score is meaningful and no need to make such a comparison with other students’. Finally, the students will know what specific content or skills will be tested since the question content would be explicitly stated in the objective of the course.

b. Validity of the Test

“By far the most complex criterion of a good test is validity, the degree to which the test actually measures what is intended to measure” Brown, 2001: 387. Furthermore, Hughes 1989: 22 declared that “a test is said to be valid if it measures accurately what is intended to be measured.” For example, a valid test for reading comprehension truly measures reading ability, nor background knowledge in a subject matter, nor some other variables of questionable significance. Hughes 1989 added that the conception of validity exposes a number of aspects which deserve attention. The first aspect is content validity. “A test is considered to have content validity if the content represents sample of language skills with which it meant to be concerned” Hughes, 1989: 22. It is clear that the test would have content validity if only it is relevant to the language skill concerned. The second aspect is criterion-related validity or the extent to which the criterion of the test has actually been reached. Essentially, there are two kinds of criterion-related validity, namely concurrent validity and predictive validity 26 Hughes, 1989: 23. Concurrent validity is constructed when the test and the criterion are administered at about the same time. While, predictive validity concerns the degree to which a test can predict the individuals’ performance in the future. The third is construct validity. Hughes 1989: 26 stated that “a test or a testing technique is considered to have construct validity if it is able to show that it really measures just the ability which it is supposed to measure.” If the test constructor is attempted to measure reading ability in reading test, then he or she can demonstrate that he or she indeed to measure just the reading ability, hence the test would have construct validity. The fourth is face validity. According to Brown 2001: 388, “to have face validity, the ‘face’ of the test should appear what claims to test.” Further, he added that a learner needs to be convinced that the test is indeed testing what it claims to test. It means that face validity refers to the grade to which a test looks right, and become visible to measure the knowledge or abilities in claims to measure, such as the instruction of the test and also the time availability.

B. Theoretical Framework

The section describes the linked theories which are reviewed and combined from the theoretical description. The use of those theories is intended to answer the research questions as stated in Chapter I. There would also be a framework to answer the two questions of the research.