Item Analysis THEORETICAL FRAMEWORK

16 the testiest than untidy test paper, full of miss spellings, omissions, and corrections. “if it happens, it will be easy for the students or testiest are easy to interpret the test items.” 38 Beside having a good criteria, the other characteristics of the test that’s more important and specific is the quality of the test items. To know the quality of the test items, teachers should use a method called item analysis.

D. Item Analysis

1. The Definition of Item Analysis An item analysis is a systematic procedure by which the teacher can get some information about the quality of the test item. According to J. Stanley Ahmann and Marvin D. Glock, “Item analysis is reexamining each test item to discover its strength and flaws.” 39 While Anthony J. Nitko, in his book stated that, “Item analysis refers to process of collecting, summarizing and using information about individual test items, especially information about pupil’s response to item”. 40 Meanwhile, Harold S. Madsen explains that the selection of appropriate language item is not enough by itself to ensure a good test. Each question needs to function properly. Otherwise, it can weaken the exam. Fortunately, there are some rather simple statistical ways or checking individuals’ item. This procedure is called item analysis. 41 38 J. Charle Anderson, Caroline Claphan and Dianne Wall, Language Test Construction and Evaluation British: Cambridge University Press, 1995, p. 161 39 J. Charles Anderson, Caroline Claphan and Dianne Wall, Language test Construction and Evaluation British: Cambridge University Press, 1995, p.184 40 Anthony J. Nitko, Educational Test and Measurement an Introduction, New York: Harcourt Brace Jovanovich, Inc., 1983, p. 284 41 Harold S. Madsen, Techniques in Testing, New York: oxpord University Press, 1983, 17 2. Kinds of Item Analysis There are three characteristics usually considered in the field of test and measurement; they are level of difficulty, discriminating power and effectiveness of distracter. a. Level of Difficulty According to Lyle F. Bachman “Item difficulty is defined as the proportion of test takers who answered the item correctly, and the item difficulty index, p, values can be calculated on the basis of test takers response to the item”. 42 The percentage is inversely related to the difficulty because the larger the percentage of correct answer, the easier the item and the more difficult the item is, the fewer will be the student who select the correct option. A good test item should have a certain degree of difficulty it may not too difficult because the tests that are too easy or too difficult will yield score distribution that make it hard to identify reliable in achievement between the pupil who have done well and those who have done poorly. According to Lyle F. Bachman in her book; Statistical Analyses for Language Assessment, said that “To analyze the level of difficulty in large group the writer has to prepare for the item analyses, first score the entire test. Then arrange them in order from the one with the highest score to one with the lowest. Next, divided the paper into three groups; those with the highest scores in one stack and lowest in another. The middle groups can be put aside for a while. It can be stated below” 43 42 Lyle F. Bachman, “Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004, p. 151 43 Lyle F. Bachman, “Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004, p. 125 18 P = RU+ R L = P U + P L 2 n 2 In which: P: Index of difficulty R U : The number of students in the upper group who got the item right R L : The number of students in the lower group who got the item right P U : The proportion of students in the upper group who got the item right P L : The proportion of students in the lower group who got the item right N: The number of students’ in the upper lower group, assuming that the two groups are equal in size In addition, Lyle F. Bachman in her book: Statistical Analyses for Language Assessment, explained that “In small group, the researcher can easily calculate the item difficulty that using all the test paper. It can be stated below”. 44 P = R N In which: P: Index of difficulty R: The total number of person who got the item correct N: The number of students’ who took a test Based on the technique previously, the writer is going to find out the difficulty level of all items in the English summative test by using this formula: 44 Lyle F. Bachman, “ Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004, p. 125 19 P = R a N n which: P: Index of difficulty R: The total number of person who got the item correct N: The number of students’ who took a test Score of P can be ranged from 0- 1. If P is 0.00 it means there are no students who can answer the item test correctly. These items belong to very difficult one. And if P is 1 means that all the students can answer the item correctly. These items belong to very easy one. To make clear the writer will give the table of difficulty level range as follow: Table 2.1 The range scale of level of Difficulty DIFFICULTY LEVEL P Difficult 0- 0.14 Moderate 0.15- 0.85 Easy 0.86- 1.00 The level of difficulty shows the easiness or difficultness of item test for that group. So the level of difficulty is influenced by student’s competence.It will different if the test is given to another group. 20 b. The Discriminating Power A good test item should have a discriminating power. The discriminating power of test item is an index that shows its ability to differentiate between pupils who have achieved well the upper group and those who have achieved poorly the lower group. 45 If the test items are given to the students who have students who have studied well, the score will be high and if they are given to those who have not, the score will be low. On the contrary, if good test item. The tests that do not have discriminating power will not yield the proper description of the students’ ability as stated by Nana Sudjana in his book:”… tes yang tidak memiliki daya pembeda tidak akan menghasilkan gambaran yang sesuai dengan kemampuan siswa yang sebenarnya.” 46 “… The tests that do not have discriminating power will not yield the proper description of the students’ ability” Therefore, it is very important to measure the discriminating power test item to produce good test items. c. The Effectiveness of Distracter One important aspect affecting the difficulty of multiple choice test items is the quality of distracter. Some distracter, in fact, might not be distracting at all, and therefore serve no purpose. 47 A good distracter will attract more students who have not studied well the lower group than the upper group. On the contrary, a weak distracter will not be selected by any of the lower achieving students. 45 J. Stanley Ahmann and Marvin D. glock, Evaluating Student Progress: Principles of the Test Measurement, Boston: allyn Bacon, Inc., 1981 , p.187 46 Nana Sudjana, Penelitian Hasil, p.141 47 Kathleen M. Bailey, Learning about Language Assessment : Dilemmas, Decisions, and Direction. London : Heinle Heinle Publisher, 1998. p.134 21 There are three common causes of weak distracter; first, sometimes an item was drilled heavily in class, so almost everyone has mastered the item, so the answer is obvious. Second, sometimes a well recognized pair is used, such as this these, isis, etc. even though, not everyone has controlled of these yet, students know that one’s of the two is the right answer, no other seems likely, and the third, cause is the use of obviously impossible distracters. For example : Did he do the work?

A. Yes, he did .