The Type of Test

of an instrument before they use the resulting information. A classroom test should be both consistent and relevant. 17 Validity refers to the extent to which the results of an evaluation procedure serve the particular uses for which they are intended. If the results are to be used to describe pupil achievement, teachers should like them to represent the specific achievement they wish to describe or to represent all aspects of the achievement they wish to describe. If the results are to be used to predict pupil success in some future activity, they should like them to provide as truthful an indication of future success as possible. 18 Test validity is the most critical factor to be judged in the total program of foreign language testing. A test is valid when it measures effectively what is intended to measure. For example, if a test is designed to measure aural comprehension, it must do exactly this and not attempt to measure another skill such as reading comprehension. There are some different ways, in which validity can be established. Most writers differ from each other in approaching and classifying validity. For the more detailed explanation of validity, the writer will discuss in the next subchapter. The second characteristic of a good test is reliability. Reliability or stability of a language test is concerned with the degree to which it can be trusted to produce the same result upon repeated administration to the same individual or to give consistent information about the value of a learning variable being measured. 19 Therefore, to be considered reliable, a language test must obtain consistent result and give consistent information. The third characteristic of a good test is practicality. Practicality means that test should be practical in administrating it. The criteria for practicality normally 17 Wilmar Tinambunan, Evaluation of Student Achievement, p. 11 18 Ibid. 19 Mary Finocchiaro and Sydney Sake, Foreign Language Testing, p. 28 will be based upon such factors as economy, scorability, and administrability. Economy means that the test should be as economical as possible in cost. Scorability means that scoring of the test can be done easily effectively, without giving a confusing matter and spending more time. Administrability means that the test should be easy to administer. First, there should be a training session for test administration, because it will facilitate the operation and save time an effort later on. Second, test instructions should be clear and concise, and yet totally comprehensible and complete. 20

B. Validity

From the previous explanation that one of characteristics of a good test is validity. Test validity is the most critical factor to be judged in the total of a foreign language testing. A test is valid when it measures effectively what it is intended to measure. Validity really is not a simple concept; however, the concept of validity reveals a number or aspect, each of which deserves our attention. Hughes classifies validity into four: content validity, face validity, construct validity, and criterion-related validity. 21 1. Content Validity Content validity is concerned with the extent to which the test is representative of a defined body of content consisting of topics and processes. 22 Moreover, the test should reflect instructional objectives or subject matters. But it is not expected that every knowledge or skill will always appear in the test; there may simply be too many things for all of them to appear in a single test. 20 Ibid., p.p. 30-31 21 Athur Hughes, Testing for Language Teacher, p. 22 22 William Wiersma, Educational Measurement and Testing , p.184 Wiersma divided content validity onto two parts, content validity of teacher-constructed tests and content validity of published tests. 23 Content validity of teacher-constructed test essentially depends on the sampling of items. If the test items adequately represent the domain of possible items, the test has adequate content validity. When a test is not content valid, there are two consequences. First, the students cannot demonstrate skills that they possess if they are not tested. Second, irrelevant items are presented that the students will likely answer incorrectly only because the content was not taught. Both of these consequences tend to lower the test scores; as a result, the test score is not an adequate measure of student performance relative to the content covered by instruction. Most teachers are quite familiar with the content they cover during instruction, and, to a large extent, teacher-constructed tests have an inherent content validity. However, in planning a test, teachers can use a straightforward procedure that tends to improve content validity. The second part is content validity of published tests. Teachers may, at least on occasion, use published tests, some of which accompany curriculum materials. The tests constructed for a specified textbook or set of materials usually have high content validity if the materials are used as intended for instruction. Sometimes materials are used as supplementary and are only partially covered, in which case any accompanying tests would at least need to be reviewed for content validity. Many school system use standardized achievement tests prepared by commercial publisher; for the most part, these are norm-referenced tests. The content of such tests is fixed and is designed to have broad coverage. Therefore, although such tests are usually very well constructed technically, they may lack adequate content validity when used in a specific situation. When curriculum committees or test selection committees in a school system 23 Ibid., p.p. 185-186