The Criteria of a Good Test

E. The Criteria of a Good Test

Making a good test needs a careful arrangement. Test as an instrument of obtaining information should have a good quality, because the quality of the test will influence the result of the test. If the test is good, the result will provide the right information to be used by the teacher in making accurate decision to the students’ achievement. According to Harris, “all good tests possess three qualities: validity, reliability, and practicality.” 27 Meanwhile, Lado consider the characteristic of a good test into five, they are validity, reliability, scorability, economy, and adminisrability. 28 a Validity Henning defines validity as follows: Validity in general refers to the appropriateness of a given test or nay of its component parts as a measure of what it is purported to measure. A test is said to be valid to the extent that it measures what it is supposed to measure. It follows that the term valid when used to describe a test should usually be accompanied by the preposition for. Any test then may be valid for some purposes, but not for others. 29 JB. Heaton said, “The validity of a test is the extent to which it measures what it is supposed to measure and nothing else”. 30 The validity of a test must be considered in measurement in this case there must be seen whether the test used really measures what are supposed to measure, briefly. The terms internal and external validity are used with the distinction being that internal validity relates to studies of the perceived content of the test and its perceived effect, and external validity relates to studies comparing student’s test scores with measures of their ability 27 David P Harris, Testing English as a Second Language New York: Goergetown Univeristy, 1996 p.13 28 Robert Lado, Language Testing. The Construction and Use of Foreign Language Tests New York: Goergetown University Press, 1997, p.30 29 G. Henning, A guide to Language Testing, Cambridge, Mass: Newburry House, 1987, p. 89. 30 JB. Heaton, Writing Language Test, Longman: 1998, p.153 gleaned from outside the test. External validity is also called ‘criterion validity’ because the students’ scores are being compared to other criterion measures of their ability. 31 1. Internal validity a. Face validity Face validity refers to the test’s surface credibility or public acceptability’ 32 , and is frequently dismissed by testers as being unscientific and irrelevant 33 . Essentially face validity involves an intuitive judgment about the test’s content by people whose judgment is not necessary ‘expert’. The judgment is usually holistic, referring to the test as a whole, although attention may also be focused upon particular poor items, unclear instructions or unrealistic time limits, as a way of justifying a global judgment about the test. b. Content validity Content validity refers to the representative of the sample of items or behaviors included in relation to what the test aims to measure. In other words, from a sample of behaviors, the test user wishes to determine how an individual would perform at present in a universe of situation that the test is claimed to represent. In a language test, the useful way of looking at this universe of items is to consider it to comprise a definition of the achievement knowledge, aptitudes and skills to be measured by the test. 34 JB. Heaton says “Content validity depends on careful analysis of the language being test and of the particular course objectives; the test should be so constructed as to contain a representative sample of the course”. 31 J. Charles Alderson, Caroline Clapham, and Dianne Wall, Language Test Construction and Evaluation, Cambridge: University Press, 1995, p. 171. 32 E. Ingram, Basic Concept in Testing, Oxord: Oxford Univeristy Press, 1977, p. 18 33 D.K. Stevenson, 1985, Authenticity, Validity and a Tea Party. Language Testing 21, pp.41-7 34 Milagros D. Ibe, 1983, Papers on Language Testing. Language Test Analysis: Beyond the Validity and Reliability Criteria , 21 : 1 Content validity is concerned with what goes into the test. Thus, the degree of content validity in a classroom test relates to how well the test measure the subject matter content studied and the behaviors which the test tasks require. A test will have high content validity if the items are representative of the population of possible task. The step that should be followed in order to be able to obtain greater assurance of content validity is as follows: 1. List up all the major topics of subject matter content and the major types of behavioral changes to be measured by the test separately. 2. The subject-matter topics and the types of behavioral changes are weighted in terms of their relative important. The amount of time devoted during the instruction or the philosophy of the school can be used as criteria to determine relative weight for the topics and behaviors. 3. Build up a table of specification that can show the weighted lists of subject-matter topics and expected behavioral changes. 4. The construction of achievement test then is based on this table of specifications. The closer the test corresponds to the specifications indicated in the table, the greater the likelihood that the pupils’ responses to the test will have a high degree on the content validity. Content validation typically takes place during test development. It is primarily a matter of preparing detailed test specifications and then constructing a test that meets these specifications. It will help teachers in determining how ell test scores represent the achievement of certain learning outcomes. In the construction of the achievement test, the teachers should be aware of some factors that tend to influence the validity of results: 1. Unclear direction 2. Word and sentence structure of items should not be too difficult 3. Inappropriate level of difficulty of the test items 4. Avoid test items which are poorly constructed 5. Ambiguity 6. Improper arrangement of the items 7. Pattern of answer which is easily identified 8. Some factors in administering and scoring a test, such as : unfair aid to individual pupils who ask for help, cheating during the examination, unreliable scoring of essay answers, insufficient time to complete the test, etc. 35 c. Response Validity An increasingly common aspect of test validation is to gather information on how individuals respond to test items. The process they go through, the reasoning they engage in when responding, are important indications of what the test is testing, at least for those individuals. Hence there is considerable current interest in gathering accounts from learnerrest takers on their test-taking behavior and thoughts. 2 External validity a. Concurrent validity Concurrent validation involves the comparison of the test scores with some other measure for the same candidates taken at roughly the same time as the rest. This other measure may be scores from a parallel version of the same test or from some other test; the candidates’ self-assessments of their language abilities; or ratings of the candidate on relevant dimensions by teachers, subject specialist or other information. b. Predictive Validity Predictive validation is most common with proficiency test: tests: tests which are intended to predict how well some body will perform in the future. The simplest form of predictive validation is to 35 Wilmar Tinambuan, Evaluation of Student Achievement, Jakarta: 1988, pp.12-14 give students a test, and then at some appropriate point in the future give them another test of the ability the initial test was intended to predict. If we can use a test of English as a second language to screen university applicants and then correlate test scores with grades made at the end of the first semester, we are attempting to determine predictive validity of the test. If, on the other hand, we follow up the test immediately by having an English teacher rate each student’s English proficiency on the basis of the two measures, we are seeking to establish the concurrent validity of the test”. 36 3. Construct validity Eibel and Fribie give the following explanation of construct validity: The term construct refers to a psychological construct, a theoretical conceptualization about an aspect of human behavior that cannot be measured or observed directly. Examples of constructs are intelligence, achievement motivation, anxiety, achievement, attitude, dominance, and reading comprehension. Construct validation is the process of gathering evidence to support the contention that a given test indeed measures the psychological construct the makers intend it to measure. The goal is to determine the meaning of scores from the test, to assure that the scores mean what we expect them to mean. 37 A shorter explanation is provided by Gronlund, who describes construct validation as measuring “How well test performance can be interpreted as a meaningful measure of some characteristic or quality”. 38 Construct validity deals with construct and underlying theory of the language learning and testing. JB. Heaton states, “If the test has construct validity it is capable of measuring certain specific characteristics in accordance with a theory of language and behavior and learning”. 36 JB. Heaton, Writing Language Test, Longman: 1998, p.154-155 37 R.L. Ebel and Frisbie D.A, Essentials of Educational Measurement, Englewood Cliffs, NJ: Prentice-Hall, 1991, p. 108. 38 N.E Gronlund, Measurement and Evaluation in Teaching, New York: Macmillan, 1985, p.58. b Reliability Reliability refers to the consistency of measurement—that is, to how consistent test scores or other evaluation results are from one measurement to another. 39 The test which is administrated to the same students on the different occasion and there is no difference to the results. It can be said that the test is reliable. In general, the consistent the test result are from one measurement to another, the less error present and, consequently, the greater the reliability. c Practicality The third characteristic of a good test is practicality or usability in the preparation of a new test. The practical criterion considers three aspects; they are economy, scorability, and administrability. 1. Economy The test must consider time and cost. If the time is reasonable to the situation of testing and so is the cost, therefore the test is economical. a Scorability It is about how to score the test. The forms of the test influence scoring. The subjective test is more difficult than objective test, it often makes the examiner hesitates in scoring. The objectives seems having scorability because it is easy in scoring. b Administrability If we provide the clear directions, perform the task quickly and efficiently, and mechanical devices are available at the moment when we do evaluation, so it can be said administrable well. 39 Norman E. Ground lund, Measurement and Evaluation in Teaching, New York: Macmillian Publishing Co., Inc., 1981, 4 th , p.93 Besides having a good criteria, the other characteristics of the test that’s more important and specific is the quality of the test items. To know the quality of the test items, teachers should use a method called item analysis.

F. Item Analysis