applied here. Below is the explanation of the three types of triangulation that this research employs.
a Time triangulation. It was employed by collecting the data over certain period of time. Using three data collection instruments, this study collected data in the
planning, action, and observing stages of the research. b Investigator triangulation. It was occupied by having more than one observer
involved in a study in order to avoid bias observation. In this study, the researcher did not observe the research conduct only on her own. The English
teacher serving as a research collaborator was also engaged. c Theoretical triangulation. It was applied by having the data collected during
the research analyzed by more than one theoretical perspectives.
2. Validity and Reliability of the Quantitative Data
Regarding the quantitative data, the only instrument employed to collect the data was tests. As with the qualitative data, this research also ensured validity and
reliability of the quantitative data by ensuring those of the instrument employed to collect the data. Figure 7 below demonstrates the stages that the research followed
to develop the tests.
1. Test-Prototype Development
2. Validity of Test Prototypes: Logical
Validity through Content Validity
3. Expert Judgment for
Content Validity
4. Test Revision I
7. Reliability of Test Prototypes:
Cronba
ch’s Alpha 6. Data Analysis
using ITEMAN 3.00
5. Test Try-Outs
8. Item analysis using Three Item Indices
9. Test Revision II: Final Draft of the Instrument
Figure 7: Stages of Test Development
The validity of the quantitative data was obtained from logical validity through content validity. According to Brown 2004: 22, content validity is
defined as “the extent to which the assessment requires students to perform tasks
that were included in the previous classroom lessons and that directly represent the objectives of the unit on which the assessment is based”. Thus, the reading
materials covered in the test prototypes were taken from the Standard of Competence SK and the Basic Competence KD of the School-Based
Curriculum KTSP, which regulates English instruction at schools in Indonesia, for Grade VIII students in the first semester on the reading skill. After that, the
test prototypes were consulted with expert judgment, which in this case they were consulted with th
e researcher’s thesis supervisor and the English teacher with which the researcher conducted the research. Revisions after consultation with
expert judgment included 1 consistency in the form of A, B, C, D written in the
instruction and those written in the alternatives of the items in each test prototype; and 2 revisions related to some spelling errors as well as some errors in sentence
structures and grammar made in the test prototypes. Then, the test prototypes were tried out to other students having the same
characteristics as those of the students in the research subjects. The results of the test try-outs were analyzed in terms of item indices and reliability of the test
prototypes using ITEMAN 3.00. According to Brown 2004, there are three item indices that should be taken into account before accepting, discarding or revising
items, namely item facility, item discrimination and distractor efficiency. The three item indices are further explained as follows.
a Item Facility IF. Information about IF of a test item in the analysis result using ITEMAN 3.00 is indicated in Prop. Correct of the Item Statistics. IF is
the extent to which an item of a test is easy or difficult for the proposed group of test-takers reflected by the percentage of students answering the item
correctly. According to Henning in Fulcher and Davidson 2007, an ideal facility value ranges from 0.3 to 0.7.
b Item Discrimination ID. There are two types of ID, namely ID of a test item and ID of a test item’s alternatives. For a test item, information about ID in the
analysis result using ITEMAN 3.00 is indicated in Point Biserial and Biserial of the Item Statistics. Likewise, ID of
each item’s alternatives is given in Point Biserial and Biserial of the Alternative Statistics. However, Fulcher and
Davidson 2007: 103 state that ‘The most commonly used method of calculating item discrimination is the point biserial correlation’. Therefore, this
study referred to the point biserial correlation for information about ID of each test item.
ID refers to the extent to which an item of a test differentiates between test-takers who do well and those who do not. The positive value indicates that
the students with a higher score in the test answer the item correctly; meanwhile, the negative value suggests that it is the students with a lower
score who answer the item correctly. However, a test item with good discriminating power garners correct responses from most of the high-ability
group and thus the value is positive. According to Henning in Fulcher and Davidson 2007, items with an r
pbi
of ≥ 0.25 are considered acceptable, while those with a lower value would be rewritten or excluded from the test. Also,
regarding the alternatives, the positive value is prefered for the key. Meanwhile, the negative value is prefered for all distractors. In addition, the
positive ID value of the key must be higher than the positive ID value of any distractor.
c Distractor Efficiency DE. “In multiple choice testing, the intended correct
option is called the key and each incorrect option is called a distractor ”
Fulcher and Davidson, 2007: 107. Information about DE in the analysis result using ITEMAN 3.00 is indicated in Prop. Endorsing of the Alternative
Statistics. DE refers to the extent to which a the distractors of a test item lure a
sufficient number of test-takers where more lower-ability test-takers answer the item incorrectly than the higher-ability ones do, and b those responses are
somewhat evenly distributed across all distractors Brown, 2004. A distractor is considered good when it is chosen minimally by 5 of the total test-takers
BNSP, 2010. Since the number of test-takers was 28, 25, and 26 each for the pre-test, post-test I and post-test II, it means that each distractor of those tests
should minimally be chosen by 2 students. The analysis results using those above-mentioned three item indices for
items in the pre-test, post-test I and post-test II are presented in Table 4 below.
Table 4: Analysis Results using Three Item Indices through ITEMAN 3.00 for Items in the Pre-Test, Post-Test I and Post-Test II