46
Try out was done twice. The first try out was to test the aspects mentioned above in the pre-test items. Whereas, the second try out was to test the post-test.
4.2.1 Item Validity
Validity refers to the extent to which inferences made from assessment results are appropriate, meaningful, and useful in terms of the purpose of the assessment. In the
computation of item validity in pre-test, the validity index of number 10 is 0.48. Then I consulted the table of r with N = 34 and significance level 5 is 0.338788 . Since
the result of the computation was higher than r in the table, the index validity of item number 10 was considered to be valid. While in the computation of item validity in
post-test, the validity index of number 1 is 0.39. Since the result of the computation is higher than r in the table, the index validity of item number 1 is considered to be valid.
See in Appendix Based on the computation of all items, there were 8 items which are invalid on
the try-out test for pre-test. Those were the item number 1, 2, 8, 13, 15, 18, 27, and 29. Therefore, the other items would be used in the test. Since there were 42 valid items, I
only used 40 items for the test based on consideration of level of difficulty and the aspects of reading comprehension. While in the try out test for post-test, there were 9
items which were invalid. Those were the item number 2, 13, 15, 18, 21, 27, 29, 46, and 50. Therefore, the other items would be used in the test. Since there were 41 valid
items, I also used 40 items for the test based on consideration of level of difficulty and the aspects of reading comprehension.
47
4.2.2 Reliability
In a test, the reliability of the test is needed to be calculated for the category of a good test. The reliability of the test was calculated from the number of items, the means of
scores, and the total of variants. A test is considered to be reliable if r
11
result of calculation is higher than r in the table 0.338788. From the calculation for
α significance level 5 and the number of students was 34, the result was 0.34. Since
the r
11
is higher than r
table
, the test was considered to be reliable. While in the try out of post-test, the result was 0.37. Therefore, it was also considered to be reliable.
4.2.3 Level of Difficulty