16
the testiest than untidy test paper, full of miss spellings, omissions, and corrections. “if it happens, it will be easy for the students or testiest are
easy to interpret the test items.”
38
Beside having a good criteria, the other characteristics of the test that’s more important and specific is the quality of the test items. To know
the quality of the test items, teachers should use a method called item analysis.
D. Item Analysis
1. The Definition of Item Analysis
An item analysis is a systematic procedure by which the teacher can get some information about the quality of the test item. According to J.
Stanley Ahmann and Marvin D. Glock, “Item analysis is reexamining each test item to discover its strength and flaws.”
39
While Anthony J. Nitko, in his book stated that, “Item analysis refers to process of collecting, summarizing and using information about
individual test items, especially information about pupil’s response to item”.
40
Meanwhile, Harold S. Madsen explains that the selection of appropriate language item is not enough by itself to ensure a good test.
Each question needs to function properly. Otherwise, it can weaken the exam. Fortunately, there are some rather simple statistical ways or
checking individuals’ item. This procedure is called item analysis.
41
38
J. Charle Anderson, Caroline Claphan and Dianne Wall, Language Test Construction and Evaluation British: Cambridge University Press, 1995, p. 161
39
J. Charles Anderson, Caroline Claphan and Dianne Wall, Language test Construction and Evaluation British: Cambridge University Press, 1995, p.184
40
Anthony J. Nitko, Educational Test and Measurement an Introduction, New York: Harcourt Brace Jovanovich, Inc., 1983, p. 284
41
Harold S. Madsen, Techniques in Testing, New York: oxpord University Press, 1983,
17
2. Kinds of Item Analysis
There are three characteristics usually considered in the field of test and measurement; they are level of difficulty, discriminating power and
effectiveness of distracter. a. Level of Difficulty
According to Lyle F. Bachman “Item difficulty is defined as the proportion of test takers who answered the item correctly, and the item
difficulty index, p, values can be calculated on the basis of test takers response to the item”.
42
The percentage is inversely related to the difficulty because the larger the percentage of correct answer, the easier the item and the
more difficult the item is, the fewer will be the student who select the correct option.
A good test item should have a certain degree of difficulty it may not too difficult because the tests that are too easy or too difficult will
yield score distribution that make it hard to identify reliable in achievement between the pupil who have done well and those who
have done poorly. According to Lyle F. Bachman in her book; Statistical Analyses
for Language Assessment, said that “To analyze the level of difficulty in large group the writer has to prepare for the item analyses, first
score the entire test. Then arrange them in order from the one with the highest score to one with the lowest. Next, divided the paper into three
groups; those with the highest scores in one stack and lowest in another. The middle groups can be put aside for a while. It can be
stated below”
43
42
Lyle F. Bachman, “Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004, p. 151
43
Lyle F. Bachman, “Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004, p. 125
18
P = RU+ R
L
= P
U
+ P
L
2
n
2 In which:
P: Index of difficulty R
U
: The number of students in the upper group who got the item right R
L
: The number of students in the lower group who got the item right P
U
: The proportion of students in the upper group who got the item right
P
L
: The proportion of students in the lower group who got the item right
N: The number of students’ in the upper lower group, assuming that the two groups are equal in size
In addition, Lyle F. Bachman in her book: Statistical Analyses for Language Assessment, explained that “In small group, the researcher
can easily calculate the item difficulty that using all the test paper. It can be stated below”.
44
P = R N
In which: P: Index of difficulty
R: The total number of person who got the item correct N: The number of students’ who took a test
Based on the technique previously, the writer is going to find out the difficulty level of all items in the English summative test by using
this formula:
44
Lyle F. Bachman, “ Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004, p. 125
19
P = R a
N n which:
P: Index of difficulty R: The total number of person who got the item correct
N: The number of students’ who took a test Score of P can be ranged from 0- 1. If P is 0.00 it means there are
no students who can answer the item test correctly. These items belong to very difficult one. And if P is 1 means that all the students
can answer the item correctly. These items belong to very easy one. To make clear the writer will give the table of difficulty level
range as follow:
Table 2.1 The range scale of level of Difficulty
DIFFICULTY LEVEL P
Difficult 0- 0.14
Moderate 0.15- 0.85
Easy 0.86- 1.00
The level of difficulty shows the easiness or difficultness of item test for that group. So the level of difficulty is influenced by student’s
competence.It will different if the test is given to another group.
20
b. The Discriminating Power A good test item should have a discriminating power. The
discriminating power of test item is an index that shows its ability to differentiate between pupils who have achieved well the upper group
and those who have achieved poorly the lower group.
45
If the test items are given to the students who have students who have studied well, the score will be high and if they are given to those
who have not, the score will be low. On the contrary, if good test item. The tests that do not have discriminating power will not yield the
proper description of the students’ ability as stated by Nana Sudjana in his book:”… tes yang tidak memiliki daya pembeda tidak akan
menghasilkan gambaran yang sesuai dengan kemampuan siswa yang sebenarnya.”
46
“… The tests that do not have discriminating power will not yield the proper description of the students’ ability”
Therefore, it is very important to measure the discriminating power test item to produce good test items.
c. The Effectiveness of Distracter One important aspect affecting the difficulty of multiple choice
test items is the quality of distracter. Some distracter, in fact, might not be distracting at all, and therefore serve no purpose.
47
A good distracter will attract more students who have not studied well the lower group than the upper group. On the contrary, a weak
distracter will not be selected by any of the lower achieving students.
45
J. Stanley Ahmann and Marvin D. glock, Evaluating Student Progress: Principles of the Test Measurement, Boston: allyn Bacon, Inc., 1981 , p.187
46
Nana Sudjana, Penelitian Hasil, p.141
47
Kathleen M. Bailey, Learning about Language Assessment : Dilemmas, Decisions, and Direction. London : Heinle Heinle Publisher, 1998. p.134
21
There are three common causes of weak distracter; first, sometimes an item was drilled heavily in class, so almost everyone
has mastered the item, so the answer is obvious. Second, sometimes a well recognized pair is used, such as this these, isis, etc. even though,
not everyone has controlled of these yet, students know that one’s of the two is the right answer, no other seems likely, and the third, cause
is the use of obviously impossible distracters. For example : Did he do the work?
A. Yes, he did .