̅ = The class percentage
F = Total percentage score
N = Number of students
The  last,  after  getting  mean  of  students‘  score  per  actions,  the  writer identifies whether the students‘ improve their understanding of narrative text from
pre-test up to post-test in cycle 1 and cycle 2. She uses the formula as below:
P = Percentage of students‘ improvement
У = pre-test result
y
1
= Post - Test 1
P =
Percentage of students‘ improvement У
= pre-test result У
2 =
Post - Test 2
F.  The Trustworthiness of  Test
To  analyze  the  examined  test  items,  the  writer  uses  the  trustworthiness  of test, there are some ways including:
1. Test Validity
Validity is the component criteria for evaluating the test or as a measure of the test. It could be about the representation of test toward the material that is
being  given  for  the  students.  Before  administering  the  pre-test,  the  writer analyzes the validity and the reliability of pre-test instrument in order to find
̅
out  whether  the  test  is  valid  or  good  to  be  used.  According  to  Arikunto, information will be valid if appropriate with the fact and the test will be valid
if it can be measure what it should be measure.
42
Before  administering  the  test,  the  writer  used  auditing  by  asking  the advisor  to  review  and  evaluate  the  study  to  ensure  the  validity  of  the
instruments.  Then,  after  the  students  did  the  pre-test,  she  used  the  Anatest software to calculate the instruments‘ validity and reliability scores.
Table 3.3 The criterion of “koefisien korelasi”
43
Scale Remark
0.80 - 1.0 Very high
0.60 - 0.80 High
0.40 - 0.60 Enough
0.20 - 0.40 Low
0.0 0.20
Very low After  the  calculation  using  ―ANATEST‖,  the  validity  value  or  XY
correlation of the pre-test instrument used in this study was 0.86. It means the test is valid and categorized into very high quality. It was gotten the data from
40  forty  questions  multiple  choices  that  was  examined  before  and  got  23 twenty-three  questions  that  was  valid  through  ANATEST  software.
Instrument that was valid are number 4, 5, 8, 9, 11, 12, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 28, 31, 33, 35, 36, 38, and 40. Meanwhile, the reliability of the
instrument  that  was  0.92  which  means  the  test  is  valid  and  categorized  into very high reliability. Then, the validity value of post-test 2 used in this study
was 0.55. It means the test is valid and categorized into enough quality. Then,
42
Suharsimi Arikunto,  Dasar-Dasar Evaluasi Pendidikan, Jakarta:Bumi  Aksara, 2010, pp. 58
—59.
43
Suharsimi Arikunto. op.cit., p. 75.
the  reliability  of  the  instrument  was  0.71  which  means  the  test  is  valid  and categorized into high reliability.
2. Discrimination Power
The  analysis  of  item  discrimination  of  test  items  is  to  know  the performance  of  the  test  through  distinguishing  students  who  have  high
achievement and low achievement. Item discrimination provides more detailed analysis of the test items  difficulty, because it shows how the top  scores and
lower scores performed on each item. The formula as following:
44
D  : The index of discriminating power U  : The number of correct answer in the upper group
L  : The number of correct answer in the lower group N  : The total number of people in the top group.
The discriminating scale uses:
Table 3.4 Discriminating Scale
DP REMARK
0.6-1.0 Very good
0.4-0.6 Good
0.1-0.3 Enough
-1-0.0 Bad
3. Item Facility
Item  facility  difficulty  item  concern  on  the  proportion  of  comparing students who answer the questions correctly with all of the students following the
44
Charles  Alderson,  Caroline  Clapham  and  Dianne  Wall,  Language  Test  Construction and Evaluation, Cambridge University Press, 1995, p. 274.
D
test.  Item  difficulty  is  how  easy  or  difficult  is  the  test  based  on  the  group  of students. The formula as following:
45
IF : Item facility
N
correct
: Number of students who selected the correct answer N
total
: Total number of students taking the test The criterion that is used is as:
Table 3.5 Criterion Scale
ID REMARK
0-0.14 Difficult
0.15-0.85 Moderate
0.86-1.00 Easy
G. The Criterion of the Action Success
Classroom  action  research  CAR  is  able  to  called  successful  if  it  can fulfill  the  criteria  which  have  been  determined,  and  fail  if  it  cannot  fulfill  the
criteria  which  have  been  determined.  The  writer  and  the  teacher  discussed  to determine  the  criteria  of  the  action  success.  This  study  is  regarded  successful  if
75 numbers of students can pass the KKM in the school is 75 seventy-five. If the study passes the criteria, thus it is called successful and if it not that will need
improvement to continue to the next cycle.
45
James Dean Brown, Testing in Language Programs, New York: McGraw-Hill, 2005, p. 66.