Content validity Reading Comprehension test
where: r
k
: the reliability of the whole tests r
xy
: the reliability of half test Hatch and Farhady, 1982:247
The criteria of reliability are as follows: 0.701
– 1.000 = highgood
0.401 – 0.700 = Averagesufficient
0.000 – 0.400 = lownot sufficient
Suparman, 2011 Practically, the reliability of the test in this research was analyzed by using
ITEMAN. The result of the reliability can be seen in Appendix 9. It was 0.781. The criteria was 0.701
– 1.000 = highgood, 0.401– 0.700 = Averagesufficient, 0.000
– 0.400 = lownot sufficient. So that, it can be concluded that the reliability of the test was highgood.
3 Level of Difficulty
Level difficulty LD level of an item shows how easy or difficult that particular item will be done by the participants Heaton, 1991. Level of item difficulty also
can be classified as the percentage of the students taking of the test who answered the item correctly. In short, it can be stated that the larger the percentage getting
an item right, the easier the item. The higher the difficult index, the easier the item is understood to be. Matlock-Hetzel 1997 states that to compute the item
difficulty. The examiner can divide the number of students answering the item correctly by the total number of students answering item. The proportion for the
item is usually denoted as p and is called difficult. An item answered correctly by 85 of the examinees would have an item difficulty, or p value, of .85, whereas
an item answered correctly by 50 of the examinees would have a lower item difficulty, or p value, of .50.
The easiest to measure the level of difficulty of an item is by using proportional scale or proportion correct p, that is, the number of test takers answering
correctly on the items under analysis is compared with the total number of test takers. The equation was as follows:
P = Where:
P : the proportion of test takers who answer correctly a certain item under
analysis : the numbers of test takers who answer correctly
N : the total number of test takers
The criteria were as follows: 0.000
– 0.099 very difficultneed total revising 0.099
– 0.299 difficultneed revising 0.300
– 0.700 averagegood 0.700
– 1.000 easyneeds dropping or total revising Suparman, 2011