An item analysis of english summative test on difficulty level and discriminating power

(1)

(A Case Study of the First Grade Students of 3 State Junior High School of Tangerang Selatan)

Written By:

Dwi Ciptaningrum

109014000051

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA 2014


(2)

(3)

(4)

(5)

of Junior High School of 3 Tangerang Selatan), Skripsi, English Education Department, Faculty of Tarbiyah and Teachers’ Training, Syarif Hidayatullah State Islamic University Jakarta.

Key words: Item Difficulty Level, Item Discriminating Power, Summative Test This study is purposed to measure the difficulty level and the discriminating power of the English summative test for the first grade of SMPN 3 Tangerang Selatan at odd semester 2013/2014 academic year. Furthermore, this study is conducted to analyze the test items whether it is included in easy, moderate, or difficult items and whether the items are able to discriminate the upper and the lower group of students.

This study is categorized as a qualitative research by using the descriptive analysis because the writer describes the difficulty level and discriminating power by analyzing the test items of the English summative test. This research is also supported by some numerical data which are analyzed statistically by using Anates Program

and manual counting. The writer takes 92 students’ answer sheets as the data which are divided the students into three groups: upper, middle, and lower group by

arranging the students’ score from the highest till the lowest scores. But the writer

only takes 27% (25 students) from the upper and the lower group of students to be analyzed.

The findings of this study are moderate level with the result 0.69 index of difficulty and the test also has the result 0.38 index of discriminating power so it is included in good quality. So, the English summative test of the first grade of SMPN 3 Tangerang Selatan at odd semester 2013/2014 academic year has good qualities in term, moderate level of difficulty and good quality of discriminating power.


(6)

of Junior High School of 3 Tangerang Selatan), Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif Hidayatullah Jakarta.

Kata kunci: Tingkat Kesukaran Butir Soal, Daya Pembeda, Tes Sumatif

Penelitian ini bertujuan untuk mengukur tingkat kesukaran soal dan daya pembeda pada tes sumatif bahasa inggris untuk tingkat pertama di SMPN 3 Tangerang Selatan pada semester ganjil tahun akademik 2013/2014. Terlebih lagi, penelitian ini dilakukan untuk menganalisa butir soal apakah tergolong dalam butir soal mudah, sedang, atau sulit dan apakah butir soal tersebut dapat membedakan siswa yang berkemampuan tinggi dan berkemampuan rendah.

Penelitian ini dikategorikan sebagai penelitian kualitatif dengan menggunakan analisis deskriptif, penulis mendeskripsikan tingkat kesukaran soal dan daya pembeda dengan menganalisa tiap soal pada tes summatif bahasa inggris. Penelitian ini juga didukung oleh beberapa data numerik yang dianalisa secara statistik. Dalam penelitian ini, penulis hanya menggunakan 92 lembar jawaban siswa sebagai data yang dibagi menjadi tiga kelompok siswa: kelompok pandai, sedang, dan kurang pandai dengan menyusun skor siswa mulai dari yang paling tinggi hingga yang paling rendah. Tetapi penulis hanya menggunakan 27% (25 siswa) dari kelompok siswa yang pandai dan kurang pandai untuk di analisa.

Hasil dari penelitian ini adalah berada pada tingkat kesukaran yang sedang dengan indeks 0.69. dan tes tersebut juga mempunyai hasil indeks sebesar 0.38 pada daya pembeda maka hasil tersebut termaksud dalam kualitas yang baik. Maka, tes sumatif bahasa Inggis pada tingkat pertama SMPN 3 Tangerang Selatan pada semester ganjil tahun ajaran 2013/2014 mempunyai kualitas yang baik dalam hal tingkat kesukaran yang sedang dan daya pembeda yang baik.


(7)

First of all, praise and gratitude be to Allah, Lord of the world Who has given the Blessing to the writer, so that this “Skripsi” could be finished completely. Then, Peace and Salutation be upon our prophet Muhammad, his families, composing, and his followers.

Grateful thanks also to her beloved families, special for her parents Mr. Tirsan and Mrs. Semirah, then her two brothers Eko Cahyo Prihartono, S. Sos. and Tri Ahmad Nur who have prayed and supported to the writer in finishing this “Skripsi”. Also the writer would like to express her greatest thanks and gratitude to:

1. Mrs. Nida Husna, M.Pd., M.A. TESOL as the first advisor and Mr. Dadan Nugraha, M.Pd. as the second advisor who had kindly spent their time and knowledge to give valuable advice, guidance, correction, and suggestion in finishing this “Skripsi”.

2. Mr. Drs. Syauki, M.Pd. and Mr. Zaharil Anasy, M. Hum. as the Head and the Secretary of English Education Department

3. Mr. H. Maryono, S.E, M.MPd. as the Headmaster and all of the teachers in SMPN 3 Tangerang Selatan who had given the writer allowed in taking the sample and supports to finishing her “Skripsi”.

4. Mrs. Dr. Farida Hamid, M.Pd. as an academic advisor for class B of English Education Department 2009/2010 academic year

5. All lecturers of English Education Department who had given their knowledge, experience, and guidance to the writer during her study in the Faculty of Tarbiyah and Teachers’ Training, State Islamic University Syarif Hidayatulah Jakarta.


(8)

their time for sharing and supports in accomplishing this “Skripsi”.

8. Finally, the writer must thank to staffs of the Main Library of State Islamic University Syarif Hidayatulah Jakarta and Library of the Faculty of Tarbiyah and Teachers’ Training. Thanks for providing the sources to fulfill the references of the writer’s writing.

May Allah the Almighty bless the all, So Be It.

Finally, the writer realizes that this “Skripsi” is still far from being perfect. So, the constructive criticism and suggestion would be welcomed to make it better.

Jakarta, September 30th, 2014


(9)

ABSTRAK ... ii

ACKNOWLEDGEMENT ... iii

TABLE OF CONTENTS ... v

LIST OF TABLES ... viii

LIST OF APPENDIXES ... ix

CHAPTER I: INTRODUCTION ... 1

A. Background of Study ... 1

B. Identification of Problem ... 4

C. Limitation of Problem ... 4

D. Formulation of Problem ... 5

E. Objective of Study ... 5

F. Significance of Study ... 5

CHAPTER II: THEORETICAL FRAMEWORK ... 6

A. Summative Test ... 6

1. Definition of Summative Test ... 6

2. Criteria of Good Test in Summative Test ... 8

a. Validity ... 8

b. Reliability ... 8

c. Practicality... 9

3. Types of Test Item in Summative Test ... 9

a. Objective Test ... 9

1) True-False ... 10


(10)

b) Types of the Essay Test... 15

1. Extended Response Type ... 15

2. Restricted Response Type ... 15

B. Item Analysis ... 16

1. Definition of Item Analysis ... 16

2. Kinds of Item Analysis ... 17

a. Level of Difficulty... 17

b. Discriminating Power... 20

c. The Effectiveness of Distractor ... 22

3. The Difficulty Level and Discriminating Power ... 23

4. The Importance of Item Analysis ... 25

C. Previous Studies ... 27

D. Thinking Framework ... 29

CHAPTER III: RESEARCH METHODOLOGY ... 30

1. Place and Time of the Research ... 30

2. Method of the Research ... 30

3. Population and Sample ... 30

4. Technique of Data Collecting ... 31

5. The Instrument of the study ... 31

6. Technique of Data Analysis ... 31

CHAPTER IV: RESEARCH FINDINGS ... 34

A. Data Description... 34

B. Data Analysis ... 34


(11)

BIBLIOGRAPHY APPENDIXES


(12)

Table 2.2 The classifications of the index of Discriminating Power ... 22

Table 3.1 The rank scale of Level of Difficulty ... 33

Table 3.2 The classifications of the index of Discriminating Power ... 33

Table 4.1 The Group Position Based on the Test Result ... 44

Table 4.2 Format of Item Analysis of the English Summative Test ... 47

Table 4.3 Classification of Items Based on the Proportion of Difficulty Level... 51

Table 4.4 Classification of Items Based on the Proportion of Discriminating Power ... 52


(13)

Appendix 2: Format of Item Analysis of the English Summative Test ... 47

Appendix 3: Classification of Items Based on the Proportion of Difficulty Level ... 51

Appendix 4: Classification of Items Based on the Proportion of Discriminating Power ... 52

Appendix 5: The English Summative Test Paper ... 53

Appendix 6: The Answer Key of the English Summative Test ... 60

Appendix 7: Table of Students’ Answer Sheet of the Upper Group ... 61

Appendix 8: Table of Students’ Answer Sheet of the Lower Group ... 62 Appendix 9: Students’ Answer Sheet paper


(14)

A. Background of the Study

Evaluation is very important in the process of teaching and learning to know the condition and the result of those in the continual time. Based on“Pasal 58 ayat

(1) UU RI No. 20 Tahun 2003 tentang Sisdiknas, yang menyatakan bahwa evaluasi hasil belajar peserta didik dilakukan oleh pendidik untuk memantau proses,

kemajuan dan perbaikan hasil belajar peserta didik secara berkesinambungan.”1

Based on the section of 58, verse (1) The Law of Republic of Indonesia Number 20, years of 2003 about Sisdiknas (System of National Education) “The evaluation of the result of student learning which is conducted by the teacher to observe the process, the progress and revision the result of the student learning continually”.

In line with the statement above, evaluation is conducted by the teacher in periods of time within the process of teaching and learning. Its purpose is to give information whether the teaching and learning have succeeded or not. On the other hand, the teacher will know the weakness of the teaching in the classroom so the teacher will revise his or her teaching, arranging the lesson planning, giving assignment, and controlling the classroom well. As Paulina Rea and Kevin Germaine, they state, “Evaluation is an intrinsic part of teaching and learning. It is important for the teacher because it can provide a wealth of information to use for the future direction of classroom practice, for the planning of courses, and for the management of learning tasks and students.”2 So, the teacher can get the information about the success of the teaching and learning process because test is one of the ways of

1

M. Sukardi, Evaluasi Pendidikan Prinsip & Operasionalnya, (Jakarta: PT Bumi Aksara, 2011), p. 12.

2

Paulina Rea and Kevin Germaine, Evaluation, (New York: Oxford University Press, 1992), p. 3.


(15)

evaluation. It can be known from the amount of the correct answer of the test, if many test items can be answered by the students correctly, it can be concluded that the teacher has succeeded in the teaching and learning process.

According to Rebecca M. Valette states, “Through tests the teacher can evaluate the effectiveness of a new teaching method, of a different approach to a difficult pattern, or of new materials.”3Therefore, “the classroom test is concerned

with evaluation for the purpose of enabling teachers to increase their own effectiveness by making adjustments in their teaching to enable certain groups of students or individuals in the class to benefit more.”4

Talking about test, there are some tests which can determine the student’s competence grades in the past of the learning activities in the classroom, one of the tests is achievement test. There are four types of achievement test which are very commonly used by teachers in the classroom: placement, formative, diagnostic, and summative test.5 The type of the achievement test which often used by the teacher to evaluate the successfulness his or her teaching and learning in the classroom is summative test.

The summative test is used in the end of the courses of instruction to know the students’ performance grade whether they have already mastered all of the materials which they have reached while the teaching and learning process. Absolutely, in the summative test has to represent all of the topics which have taught by the teacher.

3

Rebecca M. Valette, Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc., 1977), p. 5.

4

JB. Heaton, Writing English Language Tests, (London: Longman Group UK Limited, 1988), p. 6.

5

Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: Departemen Pendidikan dan Kebudayaan, 1988), p. 7.


(16)

Therefore, in making a test the teacher should have some criteria of good test. It means that, all good tests must have three qualities: validity, reliability, and practicality.6

Besides the three qualities, the test must also has a good difficulty level and an effective discriminating power because the difficulty level will give the information about the percentages of easy, moderate, and difficult items whereas discriminating power also will give the information about the effectiveness of each item whether the test item is able to differentiate the high students performance and the low students performance. Therefore, both of them can be analyzed by using item analysis.

According to J. Stanley Ahmann and Marvin D. Glock in their book about item analysis:

“Item analysis usually concentrates on two vital features; level of difficulty and discriminating power. The former means the percentage of pupils who answer correctly each item; the latter the ability of the test item to differentiate between pupils who have done well and those who have done poorly.”7

Based on the statements above, the writer would like to explain her problem while she was following the teaching practice/PPKT in the first grade of SMP Negeri 3 Tangerang Selatan she found that there were some test items which had not been taught by the teacher in the English summative of the final test which had been given on Wednesday, December 11th, 2013 at odd semester 2013/2014 academic year.

According to one of the English teachers at SMP Negeri 3 Tangerang Selatan, it is not important that in making test items the teacher tested the same materials which is given to the students, because he believed that it was similar to English National Examination test in term giving the test items.

6

David P. Harris, Testing English as a Second language, (New Delhi: Tat Mc Graw-Hill Publishing Company, Ltd), p. 13.

7

J.StanleyAhmann and Marvin D. Glock, Evaluating Pupil Growth, Principles of Tests and Measurements, (Boston: Allyn and Bason, INC, 1967), p. 184.


(17)

At one side, the materials taught by the teacher cannot be answered by the students well especially for the materials that have not been taught, it will be more difficult.

Based on the fact above, the writer would like to analyze the test by using the item analysis which is focused only on the difficulty level and the discriminating power of the English summative test of the first grade in SMP Negeri 3 Tangerang Selatan. So, the writer will conduct the study under the title “AN ITEM ANALYSIS

OF ENGLISH SUMMATIVE TEST ON DIFFICULTY LEVEL AND DISCRIMINATING POWER (A Case Study of the First Grade Students of 3 State Junior High School of Tangerang Selatan)”.

B. Identification of the Problem

Based on the background of the study, the writer identifies in some problems: 1. The teacher of the first grade at SMPN 3 Tangerang Selatan believed

that it is not important in making test items used the same materials given to the students.

2. There are some test items which have not been taught by the teacher in the English summative test at the final test of odd semester 2013/2014 academic year.

3. The upper group of students cannot answer some items well because they had not learned the material and the lower group of students can answer some items only by guessing the answers.

C. Limitation of the Problem

To make clear, the writer limits the problem based on some statements in the identification that the study is focused only on the difficulty level and the discriminating power of the English summative test of the first grade students at SMPN 3 Tangerang Selatan to find out the difficulty level from easy, moderate, and


(18)

difficult item of the English summative test and the discriminating power to analyze whether the test items can differentiate the upper and lower of students.

D. Formulation of the Problem

Based on identification and limitation of the problems, the writer conducts an item analysis to find out the percentages of difficulty level then whether the test items can differentiate the upper and lower of students. So, the writer formulates the problem as follow: “Does the English summative test for the first grade students at SMP Negeri 3 Tangerang Selatan have a good quality of the difficulty level and the discriminating power?”

E. Objective of the Study

The objective of the study is to find out the difficulty level from easy, moderate, and difficult items and the discriminating power of the English summative test whether each item can differentiate the upper and lower group of students which is tested of the first grade at SMPN 3 Tangerang Selatan in the first semester 2013/2014 academic year.

F. Significance of the Study

The first of all, the result of the study to give clear description to the reader about the quality of the English summative test of the first grade at SMPN 3 Tangerang Selatan, especially in the percentages the easy, moderate, and difficult test items then to analyze whether the test items can differentiate the upper and the lower of students.

The second, the writer hopes that the result of the item analysis can be benefit for English teachers or the test makers to improve their competences to make a good test items and also to improve their teaching and learning in the classroom.

Finally, the study can be used to previous study of an analysis of difficulty level and discriminating power to give easier to the other researchers.


(19)

In this chapter, the writer tries to give the clear description of theoretical framework which explains the definition of summative test, categories of good summative test, types of the test item in summative test, the definition of item analysis, kinds of the item analysis, and the importance of the item analysis.

A. Summative Test

1. The Definition of Summative Test

Before talking about summative test, the writer wants to elaborate first the meaning of the test. Many experts have stated about some definitions of the test, according to Antony J. Nitko in his book, Educational Tests and Measurement: An Introduction, he writes test is defined as “a systematic procedure for observing and

describing one or more characteristics of a person with the aid of either a numerical scale or category system.”1

Another opinion, test is a technique or way consisting of some questions, statements, or tasks that are delivered to students in term of measuring their performance or behavior.2 To support, Wilmar Tinambunan, said that “a test is a set of questions, each of which has a correct answer, that examinees usually answer orally or in writing.”3

Based on the definitions above it can be concluded that, test is a utility to collect the information about students’ performances in term of a set of some items

1

Antony J. Nitko, Educational Tests and Measurement: An Introduction, (New York: Marcourt Brace Jovanovich, Inc, 1983), p.6.

2

Zainal Arifin, Evaluasi Pembelajaran, (Bandung: PT. Remaja Rosda karya, 2009), p. 118.

3

WilmarTinambunan, Evaluation of Student Achievement, (Jakarta: Departemen Pendidikan dan Kebudayaan, 1988), p. 3.


(20)

such as questions, statements in orally or writing by using category system such as a scoring.

Talking about the summative test, actually it is one of the types of the achievement test. The achievement test itself including in the test which can determine the student’s competent grades in the past of the learning activities in the classroom. The type of the achievement test which often used by the teacher to evaluate the successfulness his or her teaching and learning in the classroom is summative test.

According to WilmarTinambunan in his book, “the summative test is intended to show the standard which the students have now reached in relation to other students at the same stage. Therefore it typically comes at the end of a course or unit of instruction.”4

To support the statement above, the summative test is given at the end of a marking period and measures the “sum” total of the material covered. On this type of a test, students are usually ranked and graded.5The summative test is given in the end because it comes at the end of a course (or unit) of instruction. It is designed to determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or certifying pupil mastery of the intended learning outcomes.6

It means that summative test is tested in the end of a course to know the students’ competence from all the materials which have been taught by the teacher.

4

Ibid, p. 9.

5

Rebecca M. Valette, Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc., 1977), p. 11.

6

Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Company, 1985), p. 12.


(21)

2. Categories of Good Test in Summative Test

Actually, the summative test has to represent all the materials which have been taught by the teacher. So, the teacher should have some criteria of good summative test. Those are validity, reliability, and practicality.

a. Validity

According to Wilmar Tinambunan, “Validity refers to the extent to which the results of an evaluation procedure serve the particular uses for which they are intended. Thus, the validity of the test is the extent to which the test measure what is intended to measure.”7

It can be concluded that the test will be useful if the test is able to measure what is intended to measure to know the quality of the test. It means that the test items in the summative test, the test maker must give the materials which are covered as a category of good test.

b. Reliability

The second criteria of good test are reliability. It is measured by a correlation between the scores of the same set of students on two consecutive administrations of the test.8 It can be supported by Herbert and William that, “reliability refers to the degree to which the measurements yielded by a test are consistent or stable.”9

It means that, if the test has been tested more than once in the same student in the different time but the score does not really different or change drastically with the score before it means the test can be called reliable. Another opinion, result of the score of the test not only stable but also dependable means show of the readiness of a test, and predictable that is the test is able to predict the next the result.

Heaton states that there are five factors affecting reliability of the test. The first is the extent of the sample of material selected for testing, the second is the

7

Wilmar Tinambunan, op.cit., p. 11.

8

Robert Lado, Language Testing, (New York: McGraw-Hill Book Company, 1964), p.31.

9

Herbert J. Klausmeier and William Goodwin, Learning and Human Abilities, 2nd Ed., (New York: Harper & Row, Publisher, 1961), p. 585.


(22)

administration of the test, the third is the instruction, the fourth is personal factors such as motivation and illness, the last is about the scoring the test.10

c. Practicality

Practicality, the third criteria of a good test can be called as usability. In this case the teacher or the test maker in making a test should be consideration some practical such as economy, scoring, and interpretation. As Douglas Brown said in his book, “a good test is practical. It is within the means of financial limitations, time constraints, ease of administration, and scoring and interpretation.”11 It means that there are some factors which are considered before make a test.

According to Wilmar said, “before administering a test, some factors about the administration and the test itself must be carefully considered.”12 It can be concluded that before the test is used, the test maker has think of the some consequences of the usability the test itself, such as scoring procedure in order to easier the teacher to calculate the result of the test, in giving the test item in a test, and so on.

3. Types of Test Item in Summative Test

Besides the categories of the summative test, in that test there are some kinds of the types of the test items. Those are:

a. Objective Test

This test can be called objective if the answers of the test have only one the correct answer as a key of the test item. In this test the students have choose one correct answer from some choices which provided by the teacher. In giving the scoring of objective items requires much less time than the scoring of essay items.

10

J.B.Heaton, Writing English Language Tests, (New York: Longman, 1988), p. 162-163.

11

H. Douglas Brown, Teaching by Principles, (New York: Addison Wesley Longman, Inc., 2001), p. 386.

12


(23)

With its objectivity, objective items can be accurately scored with little if any dispute about the correctness of response.

According to Zainal Arifin’s book, “There are many varieties of there new test, but four kinds are in most common use, true-false, multiple-choice, completion, matching.”13

1) True-False

Based on Jum C. Nunnally, “The popularity of the true-false item is probably due to the ease with which such items can be composed. It is usually easy to make up many such items in a relatively short period of time.”14

Besides that, James Dean Brown and Thom Hudson, in their book “requires student to respond to the language by selecting one of two choices, for instance, between and true and false or between correct and incorrect.”15

To sum up, in true-false the students are able to answer the statement with true or false by short of time. Then the function of this test is to measure the competence of the student to differentiate between the fact with the opinion. In addition, the teacher not only provides question or statement in this test but also possible to give the picture, diagram, or table.

2) Multiple-Choice

The multiple-choice is the most popular in types of test because of the multiple-choice often is used in many kinds of objective test. To support the statement above, William states in his book, that “by far the most popular type of

13

Zainal Arifin, op. cit., p. 135.

14

Jum C. Nunnally, Educational Measurement and Evaluation, (New York: McGraw-Hill, Inc., 1964), p. 160.

15

James Dean Brown and Thom Hudson, Criterion-Referenced Language Testing,


(24)

objective item is that in which the student is required to choose one alternative response to a problem or question.”16

An additional, “A multiple-choice item is an item that presents a statement (called the stem) and the student is required to select one of two or more (usually more) options that correctly completes the statement or correctly answers the problem posed in the statement.”17

Similarly, a multiple-choice item consists of one or more introductory sentences followed by a list of two or more suggested responses from which the examinee chooses one as the correct answer.18

Above all, it can be concluded that in multiple-choice item consist of two parts, the question or statement it can be called as a stem and some choices which included as the correct answer, the most correct answer and the distractors, or it can be called as option. The form of the option is possible consist of words, numbers, or statements.

For instance: There is a stem:

Who is the boy in the text above?

and a number of options – one of which is correct, the others being distracters:

A. John B. Peter C. Smith D. George

3) Completion

The completion item is a written statement which requires the examinee to supply the correct word or short phrase in responses to an incomplete sentence, a question or a word association.

16

William Wiersma and Stephen G. Jurs, Evaluation of Instruction in Individually Guided Education, (California: Addison-Wesley Publishing Company, 1976), p. 169.

17

Ibid, p. 200.

18


(25)

Actually, this item effectively to test the students’ knowledge such as the definition, names of country, and so on.19To support, “Usually, completion items require the testees to supply a word or a short phrase.”20

An example, in reading text of the summative test the teacher provides incomplete sentence or statement then the students have to fill the correct answer in the blank of that sentence or statement. For instance:

The author was surprised to meet Dr Short ………..

That item consists of blank to complete the sentence or statement which based on the text with one words or short phrase.

4) Matching

The matching item commonly appears in a two-column-format although variations on this format can be used. The two columns of a matching item are commonly called the premises and responses. Matching items lend themselves well to testing a knowledge of relationships or definitions.21

It means, there are many kinds of matching form, it begins from the premises it can be list of definitions, antonym, or synonym then the responses consist of the list of choices of the best or the appropriate answer. Usually, the answers consist more than the questions.

Besides that, Jum C. Nunnally in his book, he states that, “Students are asked to write in the blank space the letter corresponding to the option on the right which matches the item on the left. The major advantage of the matching item is that a considerable amount of material can be presented in a short space.”

19

Wilmar Tinambunan, op. cit., p. 61.

20

J.B. Heaton, op. cit., p. 124.

21


(26)

b. Subjective Test

In subjective test, the students have to answer the question based on their knowledge which have they got using their words to their writing. Talking about giving score of the subjective test, the teacher will score the answer based on the students’ answering, whether it is simple or complex answering and of course it depends on the teacher’s subjectivity. According to Arthur, in his book, “If judgment is called for, the scoring is said to be subjective.”22

Also, the students’ answering not only focuses of true or false answer but also it depends of complete or incomplete answer. At this point, the teacher also will know the competence of their students from as far as they mastered the materials which have given by the teacher.

a) Essay Test

The type of essay item the students supply their answer rather than choose the correct answer. To support, “the essay-type question requires the examinee to read the question, formulate his response and express the response in his own words.”23 It means that the students are given a freedom to express their idea to answer the question.

In addition, J. Stanley Ahmann and Marvin D. Glock, “an essay test item demands response composed by the pupil, usually in one or more sentences, of a nature that no single response or pattern of responses can be judged subjectively only by one skilled and informed in the subject, customarily the classroom teacher.”24

22

Arthur Hughes, Testing for Language Teachers…, p.19.

23

Ibid, p. 56.

24


(27)

Furthermore, the essay test usually consists of questions beginning with or including such directions as discuss, explain, outline, evaluate, define, compare, contrast, and describe.25

For instance:

Explain the differences between narration and recount text!

Therefore, there are some the advantages of the essay questions, those are: 1. The essay item is the most effective in assessing complex learning outcomes. 2. Constructing essay questions is relatively easy.

3. The possibility of guessing is minimized. 4. Constructing essay questions require less time.26

Besides that, essay test have some limitations, they are:

1. It would probably be discarded entirely as a measuring instrument did it not measure significant learning outcomes that cannot be measured by other means.

2. Unreliability of the scoring.

3. The amount of time required for scoring the answers. 4. The limited sampling they provide.27

25

Victor H. Noll, Introduction to Educational Measurement, (Boston: Houghton Mifflin Company, 1965), p. 131.

26

Ibid, p. 87-88.

27

Norman E. Gronlund & Robert L. Linn, Measurement and Evaluation in Teaching 6th Ed.,


(28)

b) Types of the Essay Test

Based on the amount of freedom of response, the essay test can be divided into two forms, those are:28

1. Extended Response Type

In this type, the students have a freedom to express their argument which based on their competence, so they can begin from the definition first then the example and the opposite, on the other hand, it can be called deductive and inductive text. But the teacher also has the criteria of the scoring of the students’ answering which based on the question.

According to Wilmar, “In the Extended Response Type Test, the pupil is given almost complete freedom in making his response.”

For example:

“Describe what you think should be the contribution of „Psycholinguistics’ to the improvement of language testing. Include those examples discussed in class”.29

2. Restricted Response Type

In the Restricted Response Type, the test item provides the specific problems and more structure. Although, they have a freedom to supply their ideas into their writing but the students have to answer systematically based on the question.

In addition, “The restricted response question usually limits both the content and the response. The content is usually restricted by the scope of the topic to be discussed. Limitations on the form of response are generally indicated in the question”.30

28

Wilmar Tinambunan, op. cit., p. 56.

29

Ibid.

30


(29)

B. Item Analysis

1. Definition of Item Analysis

According to Anthony J. Nitko in his book states, “Item analysis refers to the process of collecting, summarizing and using information about individual test items, especially information about pupils’ responses to items.”31

It can be concluded that analyzing the test item is important because it will give information for teachers, students, on how to process the teaching and learning well. Besides that, the teacher will know the quality of the test item and the effectiveness of the teaching instruction.

To support, “the feedback on individual items can help the instructor to identify points or concepts that are in need of review and further instruction.”32 It means that, the teacher will get the feedback of the progressing and revising of the teaching and learning also the test items itself in order to the teacher can provide a good test items in the next exam.

At one site, “Item analysis usually concentrates three vital features: level of difficulty, discriminating power, and the effectiveness of each alternatives. Thus, item analysis information can tell us if an item was too difficult or too easy, how well it discriminated between high and low scores on the test, and whether all the alternatives functioned as intended.”33

In the same way, item analysis tells us basically three things: 1. how difficult each item is,

2. whether or not the question “discriminates” or tells the difference between high and low students, and

31

Anthony J. Nitko, op. cit., p.284.

32

Kenneth D. Hopkins, Educational and Psychological Measurement and Evaluation (eight edition), (University of Colorado, Boulder: Allyn & Bacon, 1998), p. 254.

33


(30)

3. which distracters are working as they should.34

An additional, Lyle F. Bachman, states that, “to conduct an item analysis (IA) by hand, we first arrange the scored test papers or answer sheets in order from the highest score to the lowest score. Next, we separate the papers into upper and lower groups, according to their total test scores. In order to optimize these two objectives, for large group (i.e. 100 or larger) we would choose the upper and lower 27 percent, while for small groups, we would typically choose the upper and lower one-third.”35

2. Kinds of Item Analysis a. Level of Difficulty

In this kind of item analysis, level of difficulty have many other names, those are: item facility, item difficulty, item easiness, p-value, or it can be marked by IF. According to Heaton, “The index of difficulty (or facility value) of an item simply shows how easy or difficult the particular item proved in the test.”36

In the level of difficulty, there are three kinds of difficulties, it from easy, moderate, and difficult item. Actually, a good test item in summative test should have a certain level of difficulty it is not too difficult or too easy. If the test item is answered correctly by many students, it means the test item is easy and on the contrary.

Also, the advantage for students to know the level of difficulty in a test is it can be seen from the psychology of the students because after they know there are three kinds of difficulties from easy, moderate, and difficult the students are motivated because still there is easy item in the test and for the difficult item the students are going to prepare more intensive to follow the test.

34

Harold S. Madsen, Techniques in testing, (New York: Oxford University Press, 1983), p.180.

35

Lyle F. Bachman, Statistical Analyses for Language Assessment, (Cambridge: Cambridge University Press, 2004), p. 123.

36


(31)

According to Kathleen M. Bailey, difficulty level is “an index of how easy an individual item was for the people who took it. I.F. is a number, typically printed as a decimal, ranging from 0.0 to 1.0. It represents the proportion of people who got the item right (out of all the people who took the test).”37

The formula of the difficulty level of each item in large group is stated below:38

In which:

FV : The index of difficulty

R : The number of correct answers

N : The number of students taking the test U : Upper half

L : Lower half

n : Number of candidates in one group

To calculate the difficulty level in short-answer item, ZaenalArifin states as follows:39

37

Kathleen M. Bailey, Learning about Language Assessment: Dilemmas, Decisions, and Directions, (New York: Heinle & Heinle Publishers, 1998), p. 132.

38

J.B. Heaton, loc. cit.

39

Zainal Arifin, loc. cit.

FV

=


(32)

The next step, the writer tried to find out the difficulty level of all items in the English Summative Test for the first grade of SMPN 3 Tangerang Selatan at odd semester academic year 2013/2014 by using the following formula:40

In which:

P : The difficulty level of all items

B : The total number of difficulty level of each item ∑ : Sigma (Total)

N : The total number of test items

The scale of the difficulty level of all test items ranged from 0.00 to 1.00. It can be interpreted in the rank scale of difficulty level, as follow:

40

Zaenal Arifin, loc. cit., p. 272.


(33)

Table 2.1

The rank scale of Level of Difficulty41

P Interpretation

< 0.30 Difficult

0.30 0.70 Moderate

> 0.70 Easy

From the rank scale above, it shows the easiness and the difficultness of test items. So, the teacher will know the difficulty level of each test item in the summative test.

b. Discriminating Power

According to J. Stanley Ahmann and Marvin D. Glock, “the discriminating power of a test item is its ability to differentiate between pupils who have achieved well (the upper group) and those who have achieved poorly (the lower group).”42

Normally, the upper group will be able to answer the question well rather than the low group. The index of discriminating power is recognized by proportion if the proportion is high it means that the test item is included in good test item because it can discriminate the upper and lower group.

As stated by Kenneth D. Hopkins, items that yield a discrimination index of .30 or more are relatively good in distinguishing between knowledgeable and less knowledgeable examinees.43.

The maximum size of the index is + 1.00 and the minimum size is – 1.00. Any negative value means that the test item discriminates-to some degree-in the wrong direction. Hence, the discriminating power of the test item is unsatisfactory. Positive

41

Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta: PT. Bumi Aksara, 2006), p. 208op. cit., p. 210.

42

J. Stanley Ahmann and Marvin D. Glock, op. cit., p. 187.

43


(34)

values show that the test item discriminates in the desired direction, even though it may not be completely satisfactory.44

At one side, a good test item is it can be answered by upper group correctly and incorrectly by lower group.

To analyze the index of discriminating power of an item, Wilmar Tinambunan states the formula as follows:45

In which:

D : The index of item discriminating power

U : The number of pupils in the upper group who answered the item correctly L : The number of pupils in the lower group who answered the item correctly N : Number of pupils in each of the groups

Then, to calculate the discriminating power in short-answer item, it can be used by the formula, as follows:46

44

J. Stanley Ahmann and Marvin D. Glock, op. cit., p. 189.

45

Wilmar Tinambunan, op. cit., p. 139.

46

Zainal Arifin, op. cit., p. 278.

D =

T =

X1 X2

2 1

2 2


(35)

In which: 1

X : The mean of upper group 2

X : The mean of lower group

2

1 : The total quadrate of individual deviation from upper group 2

2 : The total quadrate of individual deviation from lower group n : 27% x N (upper and lower group)

Table 2.2

The classifications of the index of Discriminating Power (D) are:47 Index of Discriminating

Power

Classifications

0.70 – 1.00 Excellent

0.40 – 0.70 Good

0.20 – 0.40 Satisfactory

≤ 0.20 Poor

Negative value on D Very poor

c. The Effectiveness of Distracter

Based on, Mozaffer and Farhan Jaleel’s article that, “another important technique is analysis of distractors, that provides information regarding the individual distractors and the key of a test item.”48

It means that, the teacher will know the ability of the students from the answer which they have chosen.

47

Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. Raja Grafindo Persada, 2006) p. 389.

48

Mozaffer Rahim Hingorjo & Farhan Jaleel, Analysis of One-Best MCQs: the

Difficulty Index, Discrimination Index and Distractor Efficiency, (Karachi: Journal of Pakistan Medical Association, 2012), p. 1.


(36)

Actually, the test item which can be called good quality that is the distractor will be chosen by students who answer incorrect equally. On the contrary, a poor test item is the distractor will be chosen unequally. Besides that, the test item will be called good quality too if many upper group can answer correctly and only a little lower group can answer correctly.

3. The Difficulty Level and the Discriminating Power

According to J. Stanley Ahmann and Marvin D. Glock, “It has been long known that the discriminating power of an item is influenced by its difficulty.”49

To support, it also tells how difficult or easy the questions were, the difficulty index, and whether the questions were able to discriminate between students who performed well on the test, from those who did not, the discrimination index.50

Logically, after analyzing the level of difficulty the teacher will get information that it can be known the level of difficulties of each item whether the test items are included in easy, moderate, or difficult level. Then, from that level of difficulties the teacher can discriminate the students whether they are included in upper or lower group.

As an example, based on Zainal Arifin’s statement, the upper group will be able to answer the question rather than the lower group.51 So, it can be concluded that the difficult level only can be answered by the upper group of students and it cannot be answered by the lower group of students.

So, after analyzing the test items based on the difficulty level it will be better if the teacher analyzing the discriminating power in the same times and procedures. Because, in analyzing the test items the teacher not only know the level of difficulties of those items but also know how well the test items discriminate the upper and lower group students.

49

J. Stanley Ahmann and Marvin D. Glock, loc. cit. p. 189.

50

Mozaffer Rahim Hingorjo& Farhan Jaleel, loc. cit.

51


(37)

According to Heaton, “Facility values and the discrimination indices are usually recorded together in tabular form and calculated by similar procedures.”52

Above all, by analyzing the test items in term difficulty level and discriminating power, the teacher will know the qualities of the test items whether the test items have an easy, moderate, or difficult level and a high or low discriminating power.

To calculate them, it can use the formula as follows:53

In which:

FV : The index of difficulty

R : The number of correct answers

N : The number of students taking the test D : Discriminating index

U : Upper half L : Lower half

n : Number of candidates in one group

52

J.B Heaton, op. cit., p. 182.

53

Ibid.

FV

=

or

FV =

D

=


(38)

An additional, difficulty has a bearing on the discrimination value of a test – that a test which is too hard or too easy will not discriminate between individuals of different levels of achievement as well as one which is more appropriate for the range of abilities in the group.54

4. The Importance of Item Analysis

Talking about the importance of the item analysis, it is very important for the teacher to get much information from the result of the analyzing the items. By the analyzing, the teacher will know about the achievement of his or her teaching and learning process in the classroom. Not only the process itself but also the teacher knows the students’ performances. It means that, there are so many benefits of the item analysis.

Based on Anthony J. Nitko, there are six points of the importance of item analysis, those are:55

1. “Determining whether an item functions as the teacher intends.” In this book, there are five points about deciding the functioning items which teacher needs to consider: (a) whether it seems to be testing the intended instructional objective, (b) whether it is of the appropriate level of difficulty, (c) whether it is able to distinguish those who have command of the learning objectives from those who do not, (d) whether the keyed answer is correct; and (e) (for response-choice items) whether the distractors are functioning. 2. “Feedback to students about their performance and as a basis for class discussion.” The teacher gets information about the performance of their students. For example in reviewing the test, the teacher will know the students’ errors.

54

Victor H. Noll, Introduction to Educational Measurement, (Boston: Houghton Mifflin Company, 1965), p. 180.

55


(39)

3. “Feedback to the teacher about pupil’s difficulties.” In item analysis the teacher will know the students’ problem, it can be seen from the students’ errors of their answering and the teacher is suggested to take the score more than one item such as group and individual test in all the topics which has taught.

4. “Areas for curriculum improvement.” From this statement, item analysis will give information to conduct the revision of the curriculum. 5. “Revising the items.” By doing the item analysis, teacher can revise the item from students’ responses of each item if there are some items which are revised it just need a few time to revise than make the new item and the items can be used for the next testing.

6. “Improving item-writing skills.” The teacher should have item-writing skill in making items in the test and the way to improve the item-writing skill is to analyze the item from students’ responses.

In addition, Widdowson said in his book, Language Testing about the item analysis. It usually provides two kinds of information on items:56

1. Item facility, which helps us decide if the test items are at the right level for the target group, and

2. Item discrimination, which allows us to see if the individual items are providing information on candidates; abilities consistent with that provided by the other items on the test.

56


(40)

C. Previous Studies

There are some studies about the difficulty level and the discriminating power which had conducted by several researchers.

The first, Andrian Dwi Prayoga did an analysis about the difficulty level of English summative test for the second grade of Junior High School at odd semester 2010/2011 at SMPN 13 South Tangerang. This study was included in quantitative research because the researcher used some numerical data which were analyzed statistically. Also, this study was categorized as descriptive analysis because it was intended to describe the objective condition about the difficulty level of the English summative test. In this study, the researcher only took 93 students as a sample. The findings of this study were moderate items had the highest percentages with 66,7 % followed by difficult items with 20 % and easy items with 13.3 %. Overall, the difficulty level of the test was moderate level with 0.50 index of difficulty it means that the test had a good difficulty level.57

The second, Hikmah Lestari did an item analysis about the discriminating power of English summative test at the second year of SMPN 87 Pondok Pinang. This study was categorized as a descriptive analysis because it was intended to describe the objective condition about the discriminating power by analyzing the quality of English summative test items in discriminating students’ achievement. This study was considered as a quantitative research because the researcher used some numerical data which was analyzed statistically. The researcher only took 60 students as an ordinal sampling in her study. The findings of this study was the English summative test which was tested at second grade of SMPN 87 Pondok Pinang had

57

Andrian Dwi Prayoga, An Analysis on the Difficulty Level of English Summative Test for Second Grade of Junior High School at Odd Semester 2010/2011 of SMPN 13 South Tangerang (A Case Study at the Second Grade of SMPN 13 South Tangerang), Research Paper at State Islamic University Syarif Hidayatullah Jakarta. Jakarta, 2011, p. i, unpublished.


(41)

good discriminating power because there were 35 items ranging from 0.25 until 0.75 (70%) of the test items had fulfilled the criteria of a positive discriminating power.58

The last, Taufan Maulana Firdaus did an analysis about the difficulty level of English Try out Test of National Examination for Junior High School level at MTs Pembangunan UIN Jakarta. This study was categorized as a descriptive analysis because it was intended to describe the difficulty level of the English National Examination Try out test and it was also considered as quantitative research because the writer used some numerical data which was analyzed statistically. Then, he only took 110 students as a sample. The findings of the study that the test had 24 items (48%) were classified as moderate items, 22 items (44%) were concluded in easy items, and 4 items (8%) including in difficult items. The index of difficulty of the whole test items were 0.696. So, the level of difficulty of English Try out Test of National Examination test for the third grade students of MTs Pembangunan UIN Jakarta was moderate level.59

Above all, the writer compares those studies to her research that there is a similarity to the instrument that is in using the summative test. Almost all of the studies are categorized as a descriptive analysis and considered as quantitative research because of using the numerical data. But it is different from the writer that she only uses the qualitative research in her study by using the descriptive analysis and supported by some numerical data which are analyzed statistically. Then, in taking sample almost all of the previous studies above using 60 till 110 students’ answer sheet as a sample. The findings of both of them, Andrian and Topan’ s study in term the difficulty level have the same result of the writer’s research that is

58

Hikmah Lestari, An Item Analysis on Discriminating Power of English Summative Test (A Case Study of second year of SMPN 87 Pondok Pinang ), Research Paper at State Islamic University Syarif Hidayatullah Jakarta. Jakarta, 2011, p. i, unpublished.

59

Taufan Maulana Firdaus, An Analysis on the Difficulty Level of English Try out Test of National Examination for Junior High School Level (A Case Study at MTs Pembangunan UIN Jakarta), Research Paper at State Islamic University Syarif Hidayatullah Jakarta. Jakarta, 2013, p. i, unpublished.


(42)

moderate level and in the Hikmah’s study that the discriminating power also has the same result to the writer’s research result that is good discriminating power.

D. Thinking framework

Evaluation is important in teaching and learning process to know the teacher’s successfulness, in term his or her teaching, arranging the lesson planning, giving assignment, and controlling the classroom. One of the ways to get the information it can be obtained from the test.

Talking about the test, the teacher often uses the summative test because it is tested in the end of period of a course to know the students’ competence whether they have already mastered all of the materials which have been taught by the teacher or not. To know all above, it can be conducted by analyzing the test items.

In analyzing the test, it can be conducted by the teacher or the test maker after tested the test to the students. By analyzing the test items, it will be better if the teacher analysis the test based on the difficulty level and the discriminating power in the same times and procedures. Because, in analyzing the test items the teacher not only know the level of difficulties in term easy, moderate, and difficult items but also know how well the test items discriminate the upper and lower group students and whether the test items are able to be used for the next exam.

So, by doing the analyzing the test, it is very useful for teachers, students, and all of the elements of education to improve the teaching and learning process from a small things.


(43)

1. Place and Time of the Research

This research was conducted at SMPN 3 Tangerang Selatan which is located on Jl. Ir. H. Juanda Ciputat 15412, Tangerang Selatan. The writer conducted this research by taking the English Summative Test and students’ answer sheets of odd semester 2013/2014 academic year.

2. Method of the Research

This study is categorized as a qualitative research by using the descriptive analysis because the writer described the difficulty level and discriminating power by analyzing the test items of the English summative test of the first grade students of SMPN 3 Tangerang Selatan. This research is also supported by some numerical data which are analyzed statistically by using Anates Program and manual counting.

3. Population and Sample

The writer took the data from the first grade students of SMPN 3 Tangerang Selatan. She only took 92 students’ answer sheets as a purposive sampling.1 Then, the writer divided the students into three groups: upper, middle, and lower by arranging the students’ score from the highest till the lowest scores. But the writer only took 27% of the upper and lower group of students to be analyzed because the writer used largegroups (more than 40 students).2

1

Suharsimi Arikunto, Manajemen Penelitian, (Jakarta: Rineka Cipta, 2007), p. 97.

2

Antony J. Nitko, Educational Tests and Measurement: An Introduction, (New York: Marcourt Brace Jovanovich, Inc, 1983), p.287.


(44)

4. Technique of Data Collecting

The writer used the documentary study in this research by collecting the English summative test paper of first grade students of SMPN 3 Tangerang Selatan 2013/2014 academic year which had provided the answer key and students’ answer sheets to be analyzed.

5. The Instrument of the Study a. English summative test paper

The writer used the English summative test paper which had been tested on Wednesday, December 11th, 2013 at odd semester 2013/2014 academic year.

b. Students’ answer sheets

Students’ answer sheets which is used by the students to answer the questions.

6. Technique of Data Analysis

This research, the writer used qualitative and quantitative analysis technique. In qualitative, the writer analyzed the difficulty level and discriminating power of each item whereas in quantitative, she calculated the data by using some formulas. Also, the analyzing had been conducted in the same times and procedures.3

The step had been conducted with:4

a) Taking the English summative test paper and students’ answer sheets while she was following the practical teaching or PPKT

b) Arranged or ranked the score from the highest till the lowest

c) Divided the students into three groups: Upper, middle, and lower groups by arranging the students’ score from the highest till the lowest scores.

3

J.B. Heaton, Writing English Language Tests, (New York: Longman Inc., 1988), p. 182.

4


(45)

d) Decided the total of students become: 27% Upper and lower group students because the writer used large groups (more than 40 students).5 The writer took 27% from the upper group (25 students’ answer sheet) and 27% from the lower group (25 students’ answer sheet) from 92 students’ answer sheet so the upper group is taken from number 1 till 25 and lower group from number 68 till 92.

e) Analyzing the test based on the difficulty level and discriminating power f) Categorized and concluded the result of that analysis of the English

Summative Test for first grade students at SMPN 3 Tangerang Selatan 2013/2014 academic year

To calculate the level of difficulty and the discriminating power, it can use the formula as follows:6

5

Loc. cit. 6

J.B. Heaton , loc. cit.

FV

=

or

FV =

D

=


(46)

Level of difficulty for each item must be interpreted in the rank scale of the difficulty level, as follow:

Table 3.1

The rank scale of Level of Difficulty7

P Interpretation

< 0.30 Difficult

0.30 0.70 Moderate

> 0.70 Easy

Table 3.2

The classifications of the index of Discriminating Power (D) are:8 Index of Discriminating

Power

Classifications

0.70 – 1.00 Excellent

0.40 – 0.70 Good

0.20 – 0.40 Satisfactory

≤ 0.20 Poor

Negative value on D Very poor

7

Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta: PT. Bumi Aksara, 2006), p. 210.

8

Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. Raja Grafindo Persada, 2006) p. 389.


(47)

A. Data Description

In this research, the writer used the English summative test of the first grade students of SMPN 3 Tangerang Selatan at odd semester 2013/2014 academic year and the students’ answer sheets as the data sources. The total numbers of the test items are 45 items those are 40 multiple-choice items and 5 short items.

The writer used 92 students’ answer sheet then she divided into three groups which are upper, middle, and lower groups but the writer only used the upper and the lower group to analyze the data. To get these groups the writer divided 27 % of 92 students’ answer sheets. They are 27% (25 students) from upper group and 27% (25 students) from lower group.

Based on the data in the appendix of the table 4.1, the score in the upper group is begun from the highest score that is 95.5 till the lowest score of this group that is 85.5 which have the total of 25 students of this group from number 1 to 25 then the lower group is begun from the highest score of this group that is 58.5 till the lowest score that is 31.5 which have the total of 25 students this group from number 68 to 92 and the rest group is the middle group is begun from the highest score of this group that is 85.5 till the lowest score of this group that is 60 which have the total of 42 students of this group from number 26 to 67.

B. Data Analysis

After divided the data into three groups the writer only used the upper and lower groups to analyze the difficulty level and the discriminating power of each item by using Anates Program and manual counting. Then, she found the difficulty level and the discriminating power for whole items.


(48)

First of all, the writer calculated the difficulty level and the discriminating power by using the Anates Program then she matched the result by using manual counting based on the formula. The way of calculating the difficulty level is the amount of the correct answers of the upper group are added by the correct answers of the lower group then divided with the amount of the students. Then, from the result can be interpreted based on the classification whether the range less than 0.30 it is concluded in easy level, the range 0.30 to 0.70 as moderate level, or the range more than 0.70 it is included in difficult level.

The way of calculating the discriminating power is the amount of the correct answers of the upper group are lessened with the correct answers of the lower group then divided with a half of the amount of the students. Then, from the result can be interpreted based on the classification whether it is included in excellent with the range 0.70 to 1.00, good with the range 0.40 to 0.70, satisfactory with the range 0.20 to 0.40, poor with the range less than 0.20, or very poor (negative value on discriminating power).

The results of the each item, the writer combined two classification of each item whether the item is included in easy level and poor quality or difficult level and good quality. To make clear, the writer made the description of the analysis of each item based on the appendix of the table 4.2 which is conducted by the writer that she found ten classifications of the test items:

1. There are 6 items which are included in easy level and satisfactory quality,

those are items number 1, 9, 11, 25, 39, and 43.

There are the differences of index of each item in this level but still included in the same range that is more than 0.70 as easy level in term difficulty level and the range 0.20 to 0.40 as satisfactory quality in term discriminating power.

2. There are 9 items which are included in easy level and poor quality, those are items number 2, 5, 6, 7, 8, 10, 15, 17, and 44.


(49)

There are the differences of index of each item in this level but still included in the same range that is more than 0.70 as easy level in term difficulty level and the range less than 0.20 as poor quality in term discriminating power. 3. There are 8 items which are included in easy level and good quality, those are

items number 3, 14, 20, 22, 23, 27, 29, and 36.

There are the differences of index of each item in this level but still included in the same range that is more than 0.70 as easy level in term difficulty level and the range 0.40 to 0.70 as good quality in term discriminating power. 4. There are 5 items which are included in moderate level and poor quality,

those are items number 4, 12, 16, 33, and 35.

There are the differences of index of each item in this level but still included in the same range that is 0.30 to 0.70 as moderate level in term difficulty level and the range less than 0.20 as poor quality in term discriminating power. 5. There are 4 items which are included in moderate level and excellent quality,

those are items number 13, 41, 42, and 45.

There are the differences of index of each item in this level but still included in the same range that is 0.30 to 0.70 as moderate level in term difficulty level and the range 0.70 to 1.00 as excellent quality in term discriminating power. 6. There are 3 items which are included in difficult level and poor quality, those

are items number 18, 30, and 32.

There are the differences of index of each item in this level but still included in the same range that is less than 0.30 as difficult level in term difficulty level and the range less than 0.20 as poor quality in term discriminating power . 7. There are 6 items which are included in moderate level and good quality,

those items are 19, 21, 24, 26, 28, and 40.

There are the differences of index of each item in this level but still included in the same range that is 0.30 to 0.70 as moderate level in term difficulty level and the range 0.40 to 0.70 as good quality in term discriminating power.


(50)

8. There are 1 item which is included in difficult level and good quality, that is item number 31.

There are the differences of index of each item in this level but still included in the same range that is less than 0.30 as difficult level and the range 0.40 to 0.70 as good quality in term discriminating power.

9. There are 1 item which is included in difficult level and very poor quality, that is item number 34.

There are the differences of index of each item in this level but still included in the same range that is less than 0.30 as difficult level and the range is negative value, so this item should be distracted in term discriminating power. 10.There are 2 items which are included in difficult level and satisfactory quality,

those are item numbers 37 and 38.

There are the differences of index of each item in this level but still included in the same range that is less than 0.30 as difficult level in term difficulty level and the range 0.20 to 0.40 as satisfactory quality in term discriminating power.

It can be concluded from the description of analyzing above that although in the same level of difficulties and the qualities of discriminating power but the range of each item is variation because each item has the different amount of the correct answer of the upper and lower groups. Besides that, it depends on the difficulties of each item which can be answered by the student correctly.

Also, based on analyzing the data above, there are five items which are described specifically. They are the items number 8, 13, 31, 34, 36, the analyzing as follow:


(51)

Number 8 has 1.00 index of difficulty so it belongs to easy level also this item has 0.00 index of discriminating power it can be called as poor qualification. This item is included in looking for the detail information of the text, so all of the groups are able to answer the question.

Number 13 has 0.54 index of difficulty so it is included in moderate level and this item has 0.76 index of discriminating power it means that the item has excellent

qualification. The types of this item is completion, so the students feel quite difficult although the student only looking for the appropriate word of the sentence but the distractors are good because all of the options are verb.

I have a friend. Her name is Sarah. She is a new student in my class. She lives at number 27 Hang Tuah raya street, Plumbungan Indah, Sragen, Central Java. She is slim and tall. She has short and straight hair. She is twelve years old. Her nose is pointed. Her eyes are brown. She is charming and smart. Everybody likes her.

8. Where does she live?

a. Bandung c. Medan

b. Solo d. Sragen

It is Thursday. Fazar is in the school ____11____. He wants to find a book. The book is about flora and fauna. First, he goes to the catalogue to find the title and the name of ____12____. Then, he ____13____ one piece of paper. He meets the librarian and tells that he wants to _____14_____ book.

13. a. writes c. speaks


(52)

Number 31 has 0.6 index of difficulty so it can be called as difficult item also this item has 0.52 index of discriminating power it is included in good qualification items. To answer this item the students have to answer based on their knowledge so only the students who have high competence who are able to answer the question.

Number 34 has 0.12 index of difficulty so it can be called as difficult item also this item has -0.16 index of discriminating power it is included in very poor

item. Because the students have to conclude the purpose of the text in the letter and to answer this item the students have to know the clue of the text.

31. When we celebrate Heroes’ day? a. October 1st

b. August 17th c. April 21st

d. November 10th

Hello Felix, How’s going?

We’re waiting for your coming … Please, go home soon.

Miss you so much,

Hana, your beloved’s sister. 34. What is the purpose of the text?

a. to greet Felix b. to warn Felix

c. to give an advice


(53)

Number 36 has 0.72 index of difficulty so it can be called as easy level also this item has 0.56 index of discriminating power it is included in good item. In this item, almost the upper group is able to answer this question it means that this item is able to discriminate the upper and lower group.

C. Data Interpretation

Based on the data of the item analysis result in term difficulty level and discriminating power that the writer had got, it can be found that from 45 items, there are 6 items (1, 9, 11, 25, 39, and 43) regarded as easy level and satisfactory quality it means that those items have been answered by the upper and lower groups correctly and those items have good enough in differentiate the upper and the lower groups.

Next, there are 9 items (2, 5, 6, 7, 8, 10, 15, 17, and 44) regarded as easy level and poor quality it means that many students have been answered correctly and those items have low discriminating power because they cannot differentiate the upper and lower groups. Then, there are 8 items (3, 14, 20, 22, 23, 27, 29, and 36) regarded as

easy level and good quality it means that almost of the students can answer the question correctly and those items have high quality because they can differentiate the upper and the lower groups.

There are 5 items (4, 12, 16, 33, and 35) regarded as moderate level and poor quality it means that almost of the students in upper group are able to answer correctly and only some of the students in lower group who are able to answer the question. Then, there are 4 items (13, 41, 42, and 45) regarded as moderate level and

36. door – after – the – Lock – out – you – go 1 2 3 4 5 6 7 The best arrangement is ……….

a. 4312675

b. 4316257 c. 4316752 d. 4316752


(54)

excellent quality it means that almost of the students in upper group who are able to answer the question and only a little of students in lower group who are able to answer the question and those items have high discriminating power because it can differentiate the high and the lower groups.

There are 3 items (18, 30, and 32) regarded as difficult level and poor quality

it means that only a little of students of the upper and lower group who are able to answer the question and those items have low discriminating power are not able to distinguish the upper and the lower groups. Then, there are 6 items (19, 21, 24, 26, 28, and 40) regarded as moderate level and good quality it means that those items almost of the students in upper group who are able to answer the question and only a little of the students in lower group who are able to answer the question.

Next, there is 1 item (31) regarded as difficult level and good quality it means that the item can be answered by almost of the upper group and only a little of the lower group who are able to answer the question and this item have good quality of discriminating power can distinguish the upper and the lower, also this item have good distracters because all of the options can be chosen by the lower group.

Then, there is 1 item (34) regarded as difficult level and very poor quality, from the analyzing the data that this question can be answered correctly by 1 student in the upper group and this item has very poor quality because the result is minus or negative value so this item should be distracted. But there is a good point of this item that is all of the options are chosen by all groups.

The last, there are 2 items (37 and 38) regarded as difficult level and satisfactory quality, it means that almost the upper group who are able to answer the question and those items have good enough to discriminate the upper and the lower groups.

Above all, the writer can interpret the difficulty level and the discriminating power of English summative test of the first grade of SMPN 3 Tangerang Selatan at odd semester 2013/2014 academic year that based on the calculating in the appendix of the table 4.2 that the total as a whole of the index of difficulty level is 0.69 it


(55)

means that the test has moderate level of difficulty and has 0.38 index of the discriminating power so it has good quality.


(56)

A. Conclusion

Based on the analysis and the interpretation the data in the previous chapter, the test can be found the result of the test that it has 0.69 index of difficulty it means including in moderate level of difficulty. Besides the difficulty level, this test has also 0.38 index of discriminating power so it has good quality.

So, the writer concludes that the English summative test of the first grade of SMPN 3 Tangerang Selatan at odd semester 2013/2014 academic year has a good qualities in term, moderate level of difficulty and good quality of discriminating power it means that the test items are able to discriminate the upper and lower group of students and can be used for the next examination.

B. Suggestions

The writer would like to give some suggestions addressed to the test maker or the teacher as a feedback of the research results:

1. The test maker should be more creative in giving the test items, based on the material which is taught especially.

2. The teacher should analyze the test that it has been tested to the students to know whether the test good or not to use for the next exam.


(57)

Tests and Measurements, Boston: Allyn and Bason, INC, 1967.

Arifin, Zainal, Evaluasi Pembelajaran, Bandung: PT. Remaja Rosdakarya, 2009. Arikunto, Suharsimi, Dasar-dasar Evaluasi Pendidikan, Jakarta: PT. Bumi Aksara,

2006.

---, Manajemen Penelitian, Jakarta: Rineka Cipta, 2007.

Bachman, Lyle F., Statistical Analyses for Language Assessment, Cambridge: Cambridge University Press, 2004.

Bailey, Kathleen M., Learning about Language Assessment: Dilemmas, Decisions, and Directions, New York: Heinle & Heinle Publishers, 1998.

Brown, H. Douglas, Teaching by Principles, New York: Addison Wesley Longman, Inc., 2001.

Brown, James Dean and Thom Hudson, Criterion-Referenced Language Testing,

Cambridge: Cambridge University Press, 2002.

Djiwandono, M. Soenardi, Tes bahasa, Jakarta: P T INDEKS, 2008.

Firdaus, Taufan Maulana, An Analysis on the Difficulty Level of English Try out Test of National Examination for Junior High School Level (A Case Study at MTs Pembangunan UIN Jakarta), Research Paper at State Islamic University Syarif Hidayatullah Jakarta: 2013. unpublished.

Gronlund, Norman E., & Robert L. Linn, Measurement and Evaluation in Teaching 6th Ed., New York: Macmillan Publishing Company, 1990.

Gronlund, Norman E., Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Company, 1985.

Harris, David P., Testing English as a Second Language, New Delhi: Tat Mc Graw-Hill Publishing Company, Ltd, 1974.


(58)

(8 Edition), University of Colorado, Boulder: Allyn & Bacon, 1998.

Hughes, Arthur, Testing for Language Teachers, Cambridge: Cambridge University Press, 1989.

Klausmeier, Herbert J., and William Goodwin, Learning and Human Abilities, 2nd Ed., New York: Harper & Row,1961.

Lado, Robert, Language Testing, New York: McGraw-Hill Book Company, 1964. Lestari, Hikmah, An Item Analysis on Discriminating Power of English Summative

Test (A Case Study of second year of SMPN 87 Pondok Pinang ), Research Paper at State Islamic University Syarif Hidayatullah Jakarta: 2011. unpublished.

Madsen, Harold S., Techniques in testing, New York: Oxford University Press, 1983.

Nitko, Antony J., Educational Tests and Measurement: An Introduction, New York: Marcourt Brace Jovanovich, Inc, 1983.

Noll, Victor H., Introduction to Educational Measurement, Boston: Houghton Mifflin Company, 1965.

Nunnally, Jum C., Educational Measurement and Evaluation, New York: McGraw-Hill, Inc., 1964.

Prayoga, Andrian Dwi, An Analysis on the Difficulty Level of English Summative Test for Second Grade of Junior High School at Odd Semester 2010/2011 of SMPN 13 South Tangerang (A Case Study at the Second Grade of SMPN 13 South Tangerang), Research Paper at State Islamic University Syarif Hidayatullah Jakarta: 2011. unpublished.

Rea, Paulina and Kevin Germaine, Evaluation, New York: Oxford University Press, 1992.

Sudijono, Anas, Pengantar Evaluasi Pendidikan, Jakarta: PT. Raja Grafindo Persada, 2006.


(59)

Tinambunan, Wilmar, Evaluation of Student Achievement, Jakarta: Departemen Pendidikan dan Kebudayaan, 1988.

Valette, Rebecca M., Modern Language Testing, New York: Harcourt Brace Jovanovich, Inc., 1977.

Widdowson, H.G, Language Testing, Oxford: University Press, 2000.

Wiersma, William, and Stephen G. Jurs, Evaluation of Instruction in Individually Guided Education, California: Addison-Wesley Publishing Company, 1976.


(60)

No. Students Scores Groups

1. S1 95.5 Upper

2. S2 95.5 Upper

3. S3 95.5 Upper

4. S4 94.5 Upper

5. S5 94 Upper

6. S6 92.5 Upper

7. S7 92 Upper

8. S8 91.5 Upper

9. S9 91.5 Upper

10. S10 91.5 Upper

11. S11 91.5 Upper

12. S12 91 Upper

13. S13 89.5 Upper

14. S14 88.5 Upper

15. S15 88.5 Upper

16. S16 88 Upper

17. S17 87.5 Upper

18. S18 87.5 Upper

19. S19 87 Upper

20. S20 87 Upper

21. S21 87 Upper

22. S22 86.5 Upper

23. S23 86.5 Upper

24. S24 86 Upper

25. S25 85.5 Upper


(61)

29. S29 83 Middle

30. S30 83 Middle

31. S31 82.5 Middle

32. S32 81.5 Middle

33. S33 81.5 Middle

34. S34 81 Middle

35. S35 81 Middle

36. S36 81 Middle

37. S37 81 Middle

38. S38 79 Middle

39. S39 77 Middle

40. S40 77 Middle

41. S41 76.5 Middle

42. S42 75.5 Middle

43. S43 75 Middle

44. S44 74 Middle

45. S45 73.5 Middle

46. S46 73.5 Middle

47. S47 71.5 Middle

48. S48 71 Middle

49. S49 71 Middle

50. S50 70.5 Middle

51. S51 70.5 Middle

52. S52 70.5 Middle

53. S53 70.5 Middle

54. S54 70 Middle

55. S55 70 Middle


(62)

59. S59 66.5 Middle

60. S60 65.5 Middle

61. S61 64 Middle

62. S62 63.5 Middle

63. S63 63.5 Middle

64. S64 61 Middle

65. S65 61 Middle

66. S66 60.5 Middle

67. S67 60 Middle

68. S68 58.5 Lower

69. S69 58 Lower

70. S70 58 Lower

71. S71 57.5 Lower

72. S72 56 Lower

73. S73 55 Lower

74. S74 54 Lower

75. S75 53.5 Lower

76. S76 52.5 Lower

77. S77 52 Lower

78. S78 51 Lower

79. S79 51 Lower

80. S80 50 Lower

81. S81 49 Lower

82. S82 48.5 Lower

83. S83 48 Lower

84. S84 47 Lower

85. S85 46 Lower


(1)

(2)

(3)

(4)

(5)

(6)