Characteristic of a Good Test

 Predictive validity is concerned with the test score as a function to score the performance at certain time. It means that  Constructive validity is an addition to measure the validity if three of measurement above is not ample to be measured. b. Reliability Reliability refers to the consistency of the test scores in which a test measures the same thing all the time. In other words, the reliability of a test refers to its consistency in which it yields the same rank for an individual taking the test for several time. To have confidence in a measuring instrument, we would need to be assured, for example, that approximately the same result would be obtained 1 if we tested a group on Tuesday instead of Monday. 2 if we gave two parallel forms of the test to the same group on Monday and Tuesday. 3 if we scored particular tets on Tuesday and Monday. 4 if two or more scorers scored the test independently. It is clear from the foregoing that two somewhat different types of consistency or reability are involved: reability of the test itself, and reability of the scoring of test. The writer concludes test is reliable if it consistently yields the same, or nearly the same rating over repeated administration. According to Wilmar Tinambunan, “there are several ways of estimating the reliability of a test. The three basic methods and the type of information each provides are as follow: 1 Test-retest method, which indicates the stability of test scores over some given period of time. It means that the simplest technique would be retest the same individuals with the same test. If the results of two administrations were highly corrected. We would assume that the test had temporal stability. If the time interval between two testing is relatively short. The examiners memories of their previous responses will make their two performances spuriously consistent and thus lead to over estimate of test reability. On the other hand, if the interval is so long as to menimaze the memory factor, the examiners’ proficiency may have undergone a genuine change, producing different responses to the same items, and thus the test reability could not underestimated. 2 Equivalent-forms method, which indicates the consistency of test scores over different forms of the test. It means the second method of computing reability is with the use of alternate or parallel forms. That is, with different versions of the same test which are equvalent in length, difficulty, time limit, format, and other such aspects, where equvalent forms of a test exist, reliability can be increased by lenghtening the test provided always that the additional material is similar in quality and difficulty to the original. But the satisfactory reliability could be obtain only by lenghtening the test beyond all resonable limits, it would obviously be wiser to revise the material or choose another test type. this method is probably the best, but even here practice effect, through reduced, will not be entirely eliminated. 3 Internal consistency method, which indicates the consistency of test scores over different parts of the test.” 36 It means A third method for estimating the reability of a test consist in giving a single administration of one form of the test, and then, by dividing the items into two halves usually by separating the ood and even numbers item obtaining two scores for each individual by such ‗split half procedures’ two forms the results of which may be compared to provide a measure of the adequency of the sampling. If 36 Tinambunan, op. cit., p. 15. test scoring is done by two or more raters, the reliability of their evaluations can easily be checked by comparing the scores they give foir the same students response. Both validity and reliability have aims: “To ensure that a point has been understood so that further teaching and learning can be proceed, to review material studied over several previous weeks in order to prepare students for a formal examination and to familiriarise students with particular types of test format. ” 37 Finally, it can be concluded that reliability refers to the purely and simply to the precision with which the test measures. No matter how high the reliability quotient. It is by no means a guarantee that the test measure what the test wants to measure. Data concering what the test measures must be sought from some source outside the statistics of the test itself. c. Practicality After validity and reliability, the writer wrote about as a good instrument, and the writer that a test may have covered two particular things. It is validity and reliability, but, the teacher or someone who makes the test should consider in its practical matters. In the implementation of classroom test, some of the basic problem in testing such as how much paper it needed? Is it time consuming? Is it costly how much does the cost? What about the place, etc. 38 The context for the implementation of the test must be effective and efficient. Thus, it must be practicable. A thrid characteristics of a good test is its practicality. A test may be highly reable and valid instrument but still be beyond our means or facilities. Thus in the preparation of a new test or the adaption of an existing one, we must keep in mind a number of every practical consideration such as economy and ease at administering and scoring. 1. Economy 37 Allison, op. cit., p. 85. 38 Arifin, op. cit., p. 264. As most educational administrations are very well aware, testing can be expensive. If a standart test is used, writer and reader into account the cost per copy, and whether or not the text books are reusable. Again. It should be determined whether several administrations and scorers will be needed for the most personnel who must be involved in scoring a test. The more costly the process becomes. 2. Ease at administeration and scoring Other considerable of the practicality involve the ease which the test can be administered. A full, clear direction provides the test administration can perform his task quickly and effectively. It is supported by Suharsimi: “a test is categorized as high practicality if the test is practice and easy in administration. ” 39 A test is practicable if it doesn’t have a problem in its implementation Steps in Test construction and administration According to Louis in Measurement and Evaluation in the Schools, “there are some steps in constructing and administering of test. The followings are: 1. “Get ready” stage a. put the instructional material that is related to the test safely b. Make a guidelines related to the aims of test that will be measured 2. “Get set” stage a. Think over the type of test that will be used and the format of the test. b. Arrage the preliminary study related to the items that will be used. c. Consider the time will be needed to do the test. d. Examine and check test items more than three times after writing the prilimenary draft. 39 Arikunto, op. cit., p. 62. e. Organize the test items. Count the total of difficulty item in test. Give the easiest questions first to build the poorer students confidence, interest and motivation in doing the test. f. Give a clear instructions in test. g. Determine how to score the test. 3. “Go” stage a. Consistent related to the time consuming in test. b. Having a good classroom management while the students do testing. c. Administration the test orderly. ” 40 To wrap up those idea above the practicality is one of factors that have to be considered in administering the test. And based on the explanation above the validity, reliablity and practicality are the characteristics of a good test. Meanwhile, there is another particular thing that must be considered by teacher and test constructor, the thing is the quality of the test items. The method to know the quality of test items is called Item Analysis.

B. Item Analysis 1. The Definition of Item Analysis

Selecting of appropiate languge testing is not enough it self to ensufe good test. Each question needs to function properly; otherwise, it can be weaken the exam. Fortunately, there are some rather simple statical ways of checking individual items. This procedure called “item analysis”. It is most often used with multiple- choice questions. An item analysis tells us basically three things: how difficult each item is, whether or not the question “discriminates” or tells the difference between high and low students, and which distractors are working as they should. An analysis like that is used with any important exam. For example, review tests and tests given at the end of a school term or course. 40 Louis J. Karmel, Measurement and Evaluation in the Schools, New York: The Macmillian Company, 1970, p. 380. Upper group Item analysis is usually done to select which items will remain on future revised and improved version of test. There are several descriptions about item analysis. An item analysis can be used for classifying the good or bad item and why that item is classified as a good or bad item. By doing item analysis, it can be classified into: - Difficulty level of item. - Discriminating power. - Alternative option of answering the question. Before calculating discriminting power, the scores are made into three groups, they are: - Upper group - Middle group - Lower group According to Ngalim Purwanto, an item analysis has two purposes the followings are: 1. Item analysis is used for diagnostic information, it means it can used for knowing the progress failure and solution in learning. 2. Item analysis is used for reviewing the result of the test and deciding for the next test. 41 It is supported in Arikunto, the aim of item analysis are identifying the poor, satisfactory, good, and excellent item. Item analysis give information about item test that needs of review and further instruction. 42 It means that item analysis are listed according to their degrees of difficulty easy, medium, or hard and descrimination good, fair, poor these distributions provide a quick overview of the test. In addition, according to Brown and Hudson in Criterion-referenced Language Testing that: item analysis is a procedure to evaluate the effectiveness of test. The 41 M. Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, Bandung: PT. Remaja Rosdakarya, 2004, p. 118. 42 Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, Jakarta: PT. Bumi Aksara, 2006, revision edition, p. 206. aim of item analysis is determining in which items should be revised. It’s used to evaluating the test for next test. 43 It means that it can be used to identifying items which are not performing well and which can perhaps be improved or discarded. From those definitions, it can be concluded that item analysis is the process of collecting information about pupil’s responses to the items and getting the quality of test items. More specific, item analysis information can tell us if an item was too easy or too hard, how well it discriminated between high and low scores on the test and whether all of the alternatives function as intended. Item analysis data also aids in detecting specific technical flaws and thus futher provides information for improving the test items.

2. Difficulty Level

Item difficulty level of a test has ability to separate which item has difficulty to answered by students. If the test is used again with another class, don’t use items that too difficult or too easy. Rewrite them or discard them. Two or three very easy item can be placed at the beginning of the test to encourage students. Question should be arranged difficulty. Not only it is good psychology, but also helps those who don’t have a chance to finish the test; at least they have a chance to try those items that that they are most likely to get right. It is obvious that our sample item would come near the end of the test, since only a third of students got right. Item difficulty may be defined as the proportion of the examinees that marked the item correctly. Item difficulty is the percentage of students that correctly answered the item, also referred to as the p-value. 44 The following formula is used to find difficulty level. DL= Ru+RlNu+Nl Where, 43 James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, UK: Cambridge University Press, 2002, p. 113. 44 C. Boopathiraj and DR. K. Chellamani, Analysis of Test Items on Difficulty Level and Discrimination Index in the Test for Research in Education, accessed on 05 April 2016, p. 90, International Journal of Social Science Interdisciplinary Research, ISSN 2277 3630, Vol.2 2, FEBRUARY 2013. Online available at indianresearchjournals.com.