34 In this study, the test covers some micro- and macroskills. For the
microskills, the test includes recognizing grammatical world classes, recognizing system, and recognizing particular meaning in different grammar
form. Those microskills are implemented in grammatical questions. For the macroskills, the test includes considering reading comprehension strategies;
detecting main idea, supporting idea, information, generalization, and exemplification; inferring links and connection between the information;
distinguishing between literal and implied meanings; and developing reading strategies for interpretation of texts. Those macroskills are implemented in
reading comprehension questions.
5. Designing Multiple-choice Items
Multiple-choice appears as a simple kind of item to construct, but it is difficult to design correctly. Multiple-choice test are also commonly used in the
elementary, junior and high school in order to test students’ achievement toward their study especially for summative test. However, multiple-choice formats are
practical and reliable Brown, 2004: 55. The practicality to administer and to score become the reason of why multiple-choice become the favorite test used in
final examination. In contrary, according to Hughes 2003: 76-78, there are a number of
weaknesses of multiple-choice items. The first, technique test only recognizes knowledge. The multiple-choice test has limitation in measuring some skills and
ability. For example: it cannot assess speaking ability. The second, guessing may
35 have a considerable effect on the test scores. If the student does not know the
answer, they consider guessing the options. The answer can be right depending on their luck. Then, it is very difficult to write successful items because the teacher
should consider distractor efficiency. The washback may be harmful for the students because the test is restricted what can be tested. Then, cheating may be
facilitated since the students only mention the letter A, B, C, D, or E. According to Brown 2004: 56, a multiple-choice format has a primer
terminology. The first, multiple-choice items are all receptive or selective. It means that the test-takers can choose from a set of responses or usually called
supply. The second, multiple-choice has a stem that presents a stimulus and options or alternatives to choose from. The third, one of those options is key
correct response and the others are distractors. To make appropriate multiple-choice items, teachers should consider
guidelines for designing multiple-choice items. In the following, there are four guidelines for designing multiple-choice items Gronlund, 1998: 60.
a. Design each items to measure a specific objectives.
In this step, the designer should consider or revise the items so that can be appropriate with the comprehension question.
b. State both stem and options as a simply and directly as possible.
It can help students to identify the test issue without distracting the students unnecessarily. The good item is to get directly to the point.
36 c.
Make certain that the intended answer is clearly the only correct one. The designer has to eliminate the unintended possible answer so the test takers
will not be confused in answering the question. d.
Use items indices to accept, discard, or revise items. To measure whether the items are appropriate and suitable, teachers can
conduct three indicators: item facility, item discrimination, and distractor analysis Brown, 2004: 58.
In contrary, CRT development and improvement projects is quite different because the purpose of CRT is to assess how much objective has been learned by
each student, CRT assessment has to occur before and after instruction in order to determine whether there was any gain in sources Brown, 2005: 79. CRT
involves administering Item Facility IF, Difference Index DI, and B-Index Brown, 2005: 80. However, the process of gaining information about DI is
consuming time, so the writer replaced DI with Item Discrimination ID to analyze the MC items.
The IF is extent to which an item is easy or difficult for the proposed group of the test takers and the appropriate test items generally have IFs range
between 0.15 and 0.85 Brown, 2004: 58. The ID indicates the power of discriminating for each item. In line with Brown 2004:59, high discriminating
power would approach a perfect 1.0, and no discriminating power would be 0.0. B-Index is an item statistic that compares IFs of those students who passed a test
with the Ifs of those who fail it Brown, 2005:82. It indicates the degree to which the masters outperformed the non-masters on each item.
37
6. Web-based Test