An Item Analysis On Discriminating Power Of English Summative Test (An Observation at the Seventh Grade of SMP Al Wathoniyah 09 East Jakarta)

(1)

i

AN ITEM ANALYSIS ON DISCRIMINATING POWER OF

ENGLISH SUMMATIVE TEST

(An Observation at the Seventh Grade of SMP Al Wathoniyah 09 East Jakarta)

A “Skripsi

Presented to the Faculty of Tarbiyah and Teachers’ Training in Partial Fulfillment of the Requirement for the Degree of S.Pd (S-1) in English Language Education

By:

SALAMAH FAJRIAH 109014000043

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH

AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA


(2)

ii

AN ITEM ANALYSIS ON DISCRIMINATING POWER OF

ENGLISH SUMMATIVE TEST

(An Observation at the Seventh Grade of SMP Al Wathoniyah 09 East Jakarta)

A “Skripsi

Presented to the Faculty of Tarbiyah and Teachers’ Training in a Partial

Fulfillment of the Requirement for the Degree of S.Pd in English Education

By: Salamah Fajriah

109014000043

Approved by:

Advisor I Advisor II

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

2016


(3)

(4)

(5)

v ABSTRACT

Salamah, Fajriah, 2016, An Item Analysis on Discriminating Power of English Summative Test at the Seventh Grade of SMP Al Wathoniyah 09 East Jakarta, “Skripsi”, of English Education at Faculty of Tarbiyah and Teachers’ Training of State Islamic University Syarif Hidayatullah Jakarta, 2016.

Advisor: Nida Husna, M.Pd., M.A. TESOL. Dadan Nugraha, M.Pd.

Key words: Item Analysis, Discriminating Power, SMP Al Wathoniyah 09 East Jakarta.

The purpose of this study is to analyze the discriminating power of English Summative Test at seventh grade of “SMP” Al Wathoniyah 09 East Jakarta. Through this study, it is hoped that the teacher can get clear description about the quality of discriminating power of English Summative, so the teacher is able to help the poor students.

The population of the research is 146 students nevertheless in doing the research the writer took 48 students as samples. The instruments of this paper are the English summative test and students’ score. It is used to collect the data then the data is analyzed by using quantitative method. The study is considered as quantitative research, because the writer used some numerical data which is analyzed statistically.

The finding of this study is that the English summative test which is tested at the seventh grade of “SMP Al Wathoniyah 09” East Jakarta has good

discriminating power, because 23 items ranging from 0.23 until 0.46 (46%) of the test items have fulfilled the criteria of a positive discriminating power.


(6)

vi ABSTRAK

Salamah Fajriah. 2016, An Item Analysis on Discriminating Power of English Summative Test at the Seventh Grade of SMP Al Wathoniyah 09 East Jakarta, Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, UIN Syarif Hidayatullah Jakarta.

Pembimbing: Nida Husna, M.Pd., M.A. TESOL. Dadan Nugraha, M.Pd.

Kata Kunci: Analisa Butir Soal, Daya Pembeda, SMP Al Wathoniyah 09 Jakarta Timur

Tujuan dari penelitian ini adalah untuk menganalisis kekuatan daya pembeda dari soal ujian akhir sekolah kelas tujuh di "SMP" Al Wathoniyah 09 Jakarta Timur. Melalui penelitian ini, diharapkan guru bisa mendapatkan penjelasan secara yang jelas tentang kualitas daya pembeda tes summatif bahasa Inggris, sehingga guru dapat membantu siswa-siswa yang mendapatkan nilai rendah.

Populasi dalam penelitian ini adalah 146 siswa namun dalam melakukan penelitian, penulis mengambil 48 siswa sebagai sampel. Instrumen penelitian ini adalah lembar jawaban soal sumatif Bahasa Inggris dan nilai siswa. Hal ini digunakan untuk mengumpulkan data, kemudian data tersebut dianalisis dengan menggunakan metode kuantitatif. Studi ini dianggap sebagai penelitian kuantitatif, karena peneliti menggunakan beberapa data numerik yang dianalisis secara statistik.

Hasil temuan dari penelitian ini menyatakan bahwa tes sumatif bahasa Inggris yang diuji di kelas tujuh "SMP Al Wathoniyah 09" Jakarta Timur memiliki kekuatan diskriminatif baik, karena 23 item mulai dari 0,23 sampai 0,46 (46%) dari butir soal telah memenuhi kriteria dengan daya diskriminasi positif.


(7)

vii

ACKNOWLEDGMENT

In the name of Allah, the Beneficent the Merciful

All praise be to Allah, the Lords of the worlds, who has bestowed ability upon the writer in completing this ‗skripsi’ entitled: “An Item Analysis on Discriminating Power of English Summative Test (An Observation at the Seventh Grade of SMP Al Wathoniyah 09 East Jakarta) which aims to complete the partial fulfillment for Degree of Strata-1 (S-1) in the Faculty of Tarbiyah and Teachers’ Training. Peace and blessing from Allah be upon to Allah’s Messenger, our prophet Muhammad SAW, who has brought the people from stupidity era into modern era now and also to his family, his companions and all of his followers.

On this great occasion, the writer would like to thank her beloved parents

(Bapak Suminto and Mamah Netty Fardhillah), her beloved brothers (Arif, Sugeng, Heru, and Aulia), her beloved husband (Rizal) and her beloved

daughter (Halifah) who have prayed, supported and helped her in completing this paper.

The writer also would like thank her lecturers Nida Husna, M.Pd., M.A. TESOL and Dadan Nugraha, M.Pd., as her advisors who have guided and given knowledge, suggestion, motivation also assistance to the writer patiently during the process of writing her paper.

In finishing the ‗skripsi’, the writer got many guidance and motivation from people around her. The writer would like to say thank you to:

1. Dr. Farida Hamid M.Pd., as the academic advisor who have provided advice and assistance for the writer.

2. All lecturers especially those of English Education Department, for their knowledge, motivation and patience during the writer study at UIN Syarif Hidayatullah Jakarta.


(8)

viii

5. Dean of Faculty of Tarbiyah and Teachers’ Training UIN Syarif Hidayatullah Jakarta, Prof. Dr. Ahmad Thib Raya, MA.

6. Dra. Hj. Endang Ekowati., as the headmaster of SMP Al Wathoniyah 09 East Jakarta who gave the writer permission to conduct a research.

7. Eri Windiyani, S.Pd., as English Teacher and staff at SMP Al Wathoniyah 09

East Jakarta who gave writer motivation to finish this ‗skripsi’.

8. All her friends at English Education Department 2009 especially the members of class B who have always given motivation and laughing together.

May Allah the Almighty bless the all, so be it.

Finally, the writer realizes that this 'skripsi' is still far from being perfect. Constructive criticism and suggestion would be welcomed to make it better.

Jakarta, June 03rd, 2016

Writer


(9)

ix

TABLE OF CONTENTS

COVER ... i

APROVAL ... ii

ENDORSEMENT SHEET ... iii

SURAT PERNYATAAN KARYA SENDIRI ... iv

ABSTRACT ... v

ABSTRAK ... vi

ACKNOWLEDGEMENT ... vii

TABLE OF CONTENTS ... ix

LIST OF TABLES ... xi

LIST OF APPENDICES ... xii

CHAPTER I: INTRODUCTION A. The Background of the Study ... 1

B. The Statement of the Problem ... 4

C. Limitation of the Problem ... 5

D. Formulation of the Problem ... 5

E. The Objective of the Study ... 5

F. Significances of the Study ... 5

CHAPTER II: THEORETICAL FRAMEWORK A. Test: 1. The Definition of Test ... 7

2. Types of Test ... 9

3. Summative Test ... 15


(10)

x

2. Difficulty Level ... 24

3. Discriminating Power ... 25

CHAPTER III: RESEARCH METHODOLOGY A. Place and Time of the Research ... 28

B. Technique of Sample Taking ... 28

C. Technique of Data Collecting ... 28

D. Research Instrument ... 29

E. Technique of Data Analysis ... 29

CHAPTER IV: RESEARCH FINDINGS A. Description of Data ... 30

B. Data Analysis ... 38

C. Data Interpretation ... 39

CHAPTER V: CONCLUSION AND SUGGESTION A. Conclusion ... 41

B. Suggestion ... 42

BIBLIOGRAPHY ... 43 APPENDICES


(11)

xi

LIST OF TABLES

Table 4.1 The Students’ Score and Group Position of English Summative Test ………. 31 Table 4.4 The Discriminating Power Index of The Upper and Lower Group …… 35 Table 4.5 The Percentage of Discriminating Power ……… 37


(12)

xii

Appendix 1: Table 4.2 is the Students' Answer Sheet of English Summative Test

Items from the Upper Group……… 47

Appendix 2: Table 4.3 is the Students' Answer Sheet of English Summative Test Items from the Lower Group ………... 49

Appendix 3: Questions of English of Summative Test in the 2013/2014 ………. 51

Appendix 2: Students score of English Summative Test in the 2013/2014 Academic Year ………. 55

Appendix 3: Surat Bimbingan Skripsi……… 59

Appendix 4: Surat Permohonan Izin Penelitian……… 60


(13)

1

CHAPTER I

INTRODUCTION

A. The Background Of The Study

As a component in teaching and learning activities, evaluation plays an important function for both teacher and students in learning. According to Wilmar, evaluation as an essential aspect to give information about students progress after teaching and learning takes places as a part of school program.1 By giving evaluation teacher can determine whether or not the teaching and learning process can achieve the goal. Through the evaluation teacher is able to know the students’ difficulties in learning, and teacher can decide the method or instruction which is appropriate in classroom. Evaluation is also necessary in determining learning readiness, in organizing classroom groups; evaluation can determine the students as lower or higher group in achievement. According to Gronlund, evaluation is needed to evaluate students’ progress in learning, to diagnose learning problem, and to select future activities in learning.2 The purpose of evaluation shows the student’s understanding in learning; it can be as general picture of student’s understanding in learning. Evaluation can be measured by test. Test is used for collecting the data about student achievement. In line with Ted A. Baumgartner et.al., in Measurement for Evaluation: information can be collected by test.3

According to Law of National Education System No. 20/2003 Section 58 (1) that evaluation of learning outcomes of students carried out to monitor the process, progress and improvement of learning outcomes of students, on an ongoing basis. Thus, the evaluation of student’s learning should be scheduled.4 Based on the Law that the evaluation of student learning outcomes is done to improve the process and progress of student learning outcomes of final exam that have been scheduled. So, the results of the

1

Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: PPLPTK, 1988), p. 1. 2

Norman E. Gronlund, Measurement and Evaluation in Teaching, 5th edition in Wilmar Tinambunan (ed), Evaluation of Student Achievement. (Jakarta: PPLPTK, 1988), p. 3.

3

Ted A. Baumgartner et.al.,Measurement for Evaluation, (New York: McGraw-Hill, 2007), p. 3. 4


(14)

exam, the teacher can know the results of student learning according to the schedule that has been set.

Teacher needs an instrument in order to get information about student’s evaluation process. The teacher can use test and non test for evaluating students in learning. According to Sukardi; test and non-test, (those instruments) can be used for collecting the information. Test is usually given to the students in spoken or written by answering the set of questions objectively; test consists of true/flase/multiple choice/matching test. Whereas non-test is the test that measure student’s behaviour; and it’s classified as subjective test.5 The teacher usually uses test and non test for getting feedback after teaching and learning activities. Subjective and objective are terms used to refer to the scoring of tests. The objective tests usually have only one correct answer (or at least, a limited number of correct answers), and they can score mechanically

Test as an instrument that can be used to evaluate students in learning has some criteria to be good test such as: validity, reliability, and practically.6 So the test can be used as an instrument to give accurate information for teacher about students achievement. However, having only the criteria of validity, reability, and practibility in test is not enough to give information about student’s evaluation test in order to improve teaching and learning process. The teacher needs to pay attention to use item analysis. because item analysis is one of factors that contributed to improve teaching and learning process. So the teacher should pay attention not only in validity, reliability and practically factors but also in item analysis.7

Item analysis usually concentrates of three vital features: level of difficulty, discriminating power and the effectiveness of each alternatives.8 A good test also has the ability to distinguish the good students from the poor students or it has discriminating power index.

5 Prof. H.M. Sukardi, “Evaluasi Pendidikan” Prinsip dan Operasionalnya

, (Jakarta: PT. Bumi Aksara, 2011), p. 11.

6

Zainal Arifin, Evaluasi Pembelajaran, (Bandung: PT. RemajaRosdakarya, 2013), p. 117. 7

M. Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: PT. RemajaRosdakarya, 1984), p. 118.

8


(15)

3

Discriminating power can be used for classifying the students become upper, middlle and lower group in doing the test.9 Discriminating power can be predicted the upper group must answer the question correctly. Meanwhile, the lower group must answer the question incorrectly. Furthermore, the items of the test should have appropriate facility value, in other words, the items are not too difficult or too easy for the student. So, item analysis is a procedure in reexamining each test item to discover its high ability student and low ability student.

If the teacher forgets to use validity, reliability, practically and item analyis such as level of difficulty and discriminating power in administering the test, the test can not be as an accurate data to give the teacher information about students’ evaluation in learning because it can not measure the student’s ability in doing the test and the teacher can not differentiate the upper and the lower group.

Teacher can use two types of the tests for determining student’s ability; such as aptitude test and achievement test. Aptitude test is designed to predict student’s progress in learning and the other one is achievement test. Achievement test is associated with the process of instruction.10 There are four types of achievement11; the first one is a placementtest that is used for determining student’s performance. Next, formative test that is used for monitoring learning and giving feed back after teaching and learning takes place. Then, diagnostic test that is used for determining the difficulties in learning. And the last, summative test that is used as a standardize that the students have to achieve at the same stage.

If the teacher use summative test to know whether the objective achieves or not. The summative test is very important because the teacher classifies the student’s score ; by administering summative test, the teacher knows the student’s capability in doing test.12 And the essential factor in administering the summative test still considers the validity, reliability, practically and item analysis so the test can give the teacher information about student’s capability in doing the test. Summative test is administered

9

M. Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Ramadja Karya CV, 1986), p. 152.

10

Tim Mc.Namara, Language Testing, (New York: Oxport University Press, 2000), p. 6. 11

Wilmar, loc. cit., p. 7. 12


(16)

at the end of the whole program instruction and the aim is to determine whether or not the students have achieved the instructional objectives. After getting the results of the evaluation in test, teachers are expected to further improve teaching in the classroom, so that students have better understanding of the material.

To know the quality of the dicriminating power of the summative test in the school, the writer chose SMP Al Wathoniyah 09 East Jakarta, one of the best schools in East Jakarta because SMP Al Wathoniyah 09 East Jakarta always gets accreditation A from the 1900s. In the writer’s prediction the teacher has already used validity, reliability, practically and item analysis in making summative test. The summative test can reflect the level capability of the students (upper, middle, and lower group). But, the fact in the school, that the student’s answer didn’t reflect the upper group that can answer the question correctly so the upper group got varied score. Some of the students whose catagories as upper group got low score. The descriminating power in item analysis is essential factor in determining students as upper, middle, and lower group.

Based on this reason, the writer is interested in doing the research about analizing the discriminting power of summative test in 2014 of SMP Al Wathoniyah 09 East Jakarta at the seventh grade.

B. The Statement Of The Problem

The teacher sometime forgot to consider using validity, reliability, practically and item analysis such as level of difficulty, descriminating power and effectiveness of each alternatives in administering the test. So the result of the test could not give an accurate information about students evaluation in learning. This problems happen in SMP Al Wathoniyah 09 East Jakarta that the summative test can not give the teacher about the students capability and effective information because it can not measure the student’s ability in doing the test and the teacher can not differentiate the upper and the lower group.


(17)

5

C. Limitation of the Problem

Based on the identified problems above, the writer limits the study on analyzing the item analysis on discriminating power of English summative test at the seventh grade of

SMP Al Wathoniyah 09 East Jakarta at second semester 2014/2015.

D. Formulation of the Problem

Referring to the background above, writer formulates the research problem as follows: “Does English summative test of SMP Al Wathoniyah 09 East Jakarta at seventh grade on May 20 at second semeter 2014/2015 academic year which made by Government Service of East Jakarta or Dinas Jakarta Timur have a good quality of discriminating power”?

E. The Objective of the Study

The objective of this study is to find out the discriminating power of English summative test of SMP Al Wathoniyah 09 East Jakarta at seventh grade on May 20 at second semeter 2014/2015 academic year.

F. Significances of the Study

The result of this study are expected to contribute the four groups, they are; (a) the students (b) the English teacher, (c) the school principal, and (d) further researchers.

a) The result of this study are useful for the students who have well in doing the test and they who don’t have, the teacher is able to help and find solution in learning for the good and poor students.

b) The result of this study is useful for the English teachers at for the seventh grade at the second year students of SMP to provide better insight on how to make better English Summative test to be used in evaluation activity. It is also hoped to enrich to teach knowledge of English summative test analysis.

c) The result of this study is useful for the school principal how to develop the curriculum and method from the school to practice in the classes, so that the material subject is focused in teaching and learning process.


(18)

d) The result of this study can be used as a reference for the next researcher who are interested in developing similar study and can bring positive impact in evaluation process at seventh grade students else.


(19)

7

CHAPTER II

THEORETICAL FRAMEWORK

A. Test

1. The Definition of Test

Test is an instrument that is used in the evaluation after teaching and learning process. The test is not only popular in the formal education, but also in the informal education such as for pre-test, schoolarship test, medical test, exercise test, and others.1 It means that test can be administered in formal and informal education with different purposes. One of the aims of the test is to evaluate learning activities. Test is concerned with evaluation in enabling teachers to increase their own effectiveness by making adjustments in their teaching and to enable certain group of students or individuals in the class to get benefit from learning.

Furthermore, test is an instrument that consists of correct answer in every item that can be administered in oral or written.2 The item test should have one answer. This answer must be absolutely correct unless the instruction specifies choosing the best option. Each option should be grammatically correct when placed in the stem, and the option should be as brief and as clear as possible.

Moreover, test is an instrument which uses systematic procedure related to one or more characteristics of a student. The teacher can observe and describe the characteristics of students and the result of test can be measured by certain categories in score.3 It means that the test helped the teacher to consider for their characteristics related to their score in doing the test.

1

Zainal Arifin, Evaluasi Pembelajaran, (Bandung: PT. Remaja Rosdakarya, 2013), p. 117. 2

Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: PPLPTK, 1988), p. 3.

3

Anthony J. Nitko, Education Tests and Measurement: An Introduction, (USA: Marcourt Brace Jovanovch, Inc., 1983), p. 6.


(20)

In addition, there are four elements to administer the test: first, a test has the systematically procedured. It means that the teacher has already used a fix and organized plan in administering the test, then the test has the questions for the students;in administering the test the teacher not only prepared the questions but also distractor and the best answer of the test, before a test is constructed, it is important to question the standards which are being set, next a test can measure the students behavior,the test can be as an instrument to evaluate the students behavior and the last, after doing the test, students should have the scores.4

It means that the test will systematically administer in order to get the evaluation and the reflection from student’s understanding in answering some questions and participating in the test. After getting score the teacher can know the students understanding in material were success or a failure.

Furthermore, in line with Anthony J. Nitko a test is an instrument to get the student’s score.5

The teacher must prepare how to score the test and the teacher has to avoid subjectivity in scoring the test, so it can be used as an instrument. Besides, a test is a technique to measure the ability of student’s knowledge or performance.6 When scoring students performances, the examiner should concentrate on what individual students are doing with the target language and how they are using it to achieve purpose. Tests can measure the characteristics of the students.7 Test not only can measure student’s knowledge and performance but also can measure student’s intelligence and ability in learning and it is shown by score.

Based on the concepts above the writer concluded that the test is not necessarily a written set of questions to which an individual responds in order to determine whether he/she passes. A more inclusive definition of a test is a means of measuring the knowledge, skills, feelings, intelligence or aptitude of an individual

4

Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Company, 1990), p. 5.

5

Jum C. Nunnally, Educational Measurement and Evaluation, (New York: McGraw-Hill Book

Company, 1972), 2nd edition, p. 6. 6

H. Douglas Brown, Teaching by Principles an Interactive Approach to Language Pedagogy, (New York: Addison Wesley Longman, Inc., 2001), 2nd edition, p. 384.

7

Charles D. Hopkins and Richard L. Antes, Classroom Measurement and Evaluation, (USA: F.E. Peacock Publishers, Inc., 1990), 3rd edition, p. 326.


(21)

9

or a group. Tests produce numerical scores which can be used to identify, classify or otherwise evaluate test takers.

2. Types of Test

The variation of the types of test has the aims to make the test taker and testee choose the appropriate test which is needed in teaching and learning process.

Test is one of categories based on the information they obtain.8 So the tests must be organized well in order to get a good result. Teacher is an ideal person who knows which tests are appropriate for her or his class. Since he or she has already known the condition, method, time, intellegence, and treatment for her or his students.

There are various types of tests that can be implemented in the class. J.B. Heaton points out four types of test: achievement test/attainment tests, proficiency test, aptitude test, and diagnostic test.9 It is supported by Rebecca in

8

Arthur Hughes, Testing for Language Teachers, (United Kingdom: Cambridge University Press, 2003), 2nd edition, p. 11.

9


(22)

Modern Language Testing, there are four basic types of language tests: aptitude tests, progress tests, achievement tests, and proficiency test.10

1) Aptitude Test

An aptitude is primarily designed to predict success in some future learning activity.11 It means that It is generally given before begining language study, and may be used to select students for a language course or to place students in section appropriate to their ability so it can be predict the student’s success in learning in the future relate to their understanding in material.

In addition, aptitude tests are designed to give student’s prediction in performance after taking some specific instruction or trainning.12 It means that this kinds of test can be evaluate students’ performance after taking in trainning program.

Furthermore, according to Lyman in Test Score and What They Mean,

aptitude test is needed to select job, to admission to training program, to classify scholarship, and to others.13 The aptitude test is used for determining learning readiness, individualizing instruction, organizing classroom groups, identifying underachivers, diagnosing learning problems and helping students in their educational and vocationing plans.

Moreover, aptitude test is a kind of test that is administered before the students takes the course and the test can identify student’s capability in the course so it can be classified as achievement test too because it has the same types in scoring. These score is used to locate the difficulty area in learning. This test also help to make a standarization skills and knowledge that the students have to pass at certain level.14

It can be summarized that aptitude test is focused on potential subsequenct performance in the future, aptitude test also attempted to indicate what person could learn if opportunity and motivation are present.

10 Rebecca M. Valette,

Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc., 1977), 2nd edition, p. 5.

11

Ridwan Mohamed Osman. Educational Evaluation and Testing, 2010, p. 51,

(http://en.wikipedia.org/wiki/Creative_Commons). Accessed on 05 April 2016. 12

Ibid. 13

Howard B. Lyman, Test Scores and What They Mean, (United States: Allyn and Bacon, 1998), 6th edition, p. 22.

14


(23)

11

2) Achievement Test

Achievement test is used to measure the student has learnt during a course of instruction. It is given at the end of the course. The content of the achievement test is generally based on the course of syllabus or the course textbook. Furthermore, according to Alan Davies, et al., achievement test is the designed instrument of to measure the ability of the students after teaching and learning process in completed, or in accordance with the syllabus in the schools.15 In addition achievement test will be done at the end of the study program to know the progress of the student after teaching and learning process.16 It means that achievement test is a kind of test that will be given at the end of the course or at the end of term. The material that will be tested is based on the content that the students has learnt.

According to Heaton in Writing English Language Test, achievement test divides into progress test and (standardized) achievement tests.17 The progress test is designed to measure to what extent of which the students have mastered the material taught in the classroom; for example, formative test. That is for giving the teacher feedback about student progres in learning; it can monitor and decide whether the instruction and the activities during learning were succes or not. And the main concern on formative test is to determine the area of learning which needs improvement.18 This test was administered at the end of a unit or end of a semester. The aim is to stimulate learning and to reinforce what has been taught.

Meanwhile, achievement test (or attainment) included formal test and is intended to measure achievement on a large scale. These tests are based on what the students are presumed to have learnt – not necessarily on what they actually

15

Alan Davies et al., Dictionary of Language Testing, (UK: University of Melbourne, 1990), p. 2.

16

Tim McNamara, Language Testing, (New York: Oxford University Press, 2000), p. 6. 17

Heaton, op. cit., p. 172. 18

Hanna and Dettmer, Northern Illinois University, Faculty Development and Instructional

Design Center, Formative and Summative Assessment, 2004, p. 1, (facdev@niu.edu


(24)

learnt nor on what has actually been taught. These tests were administered at the end of an in instructional segment not at the end of a unit of the material.

To sum up those concepts about achievement test, the achievement test is one of the tests that is commonly given in the end of term or the study program. The aim of this test is measuring the student’s understanding during the studying program.

3) Proficiency test

Proficiency test is a kind of test that is used to test people’s ability in a language, without any training before,19 it means that the test will be measured people ability in language, for example if people want to measure they ability in speaking English before they take English course, they can take a proficiency test. By taking this test, people can know how significant his or her proficiency in speaking before taking a course.

Moreover, another definition of proficiency test can measure what student have learned and the proficency tests, in fact, usually report student language ability on a continum that reflects a predetermined set of categories.20 It means that it can measure the proficiency of someone for the specific language requirements.

Furthermore, other definition of this test focuses on not general achievement but specific skills in certain language program.21 When writer and reader need to test someone skills and experience in certain language, so we use proficiency test. It helps someone to decide his or her future course in certain language. By using this test someone can know his or her proficiency level in language.

To wrap up from those concepts above, proficiency test is used to measure the proficiency, skill and experienced in language before the students take a course or English program. This test also help the students to know their level in understanding in using the language and to know the capabiliy and the

19

Hughes, op. cit., p. 11. 20

Valette, op.cit., p. 6. 21


(25)

13

background knowledge of students relate to their proficiency in language before getting any treatment in a course.

4) Diagnostic test

Diagnostic test is administered to diagnose areas of difficulty, so that appropriate remedial action can be taken later. Gronlund stated that diagnostic test is concerned with the persistent learning difficulties includes more than diagnostic testing. However, according to Gronlund, tests are used in the total process. Tests are left unresolved by the standard corrective prescription of formative evaluation.22 This test help testee determine which area the student’s have difficulty in learning and material that students have lack of understanding.

23

Diagnostic tests is designed to recognize learners strenghts and weaknesses,24 this test help to make sure which learning still need improvement. This test gave some information related between gaps exist in the command of language, and could be directed to source of information and practice.

To sum up those concepts, the diagnostics test is a test that is designed to give the test taker and testee to locate the difficulty, error, and misunderstanding in using the language so this test can give the general picture to identify the remedial and improvement process are should be taken for the next test.

22

Norman E. Gronlund, Measurement of Evaluation in Teaching, (New York: Macmillan Publishing Co., Inc.,1981), 4th edition, p. 125.

23 Arthur Hughes,

Testing for Language Teachers, (United Kingdom: The Press Syndicate of the University of Cambridge, 2003), 2nd edition, p. 15.

24 Ibid.


(26)

5) Progress test

There has been increasing use and significance of progress testing in education. It is used in many ways and with several formats to reflect the variety of curricula and assessment purposes.25 It means that the curricula and assessment purposes as a consideration in administering the test.

Progress test is given for student of unit at the end in semester. This hand book can help the teacher to improve the student progress and evaluate for student achievement.26 It means that this test is administered in the end of program or course. It helps to give the test taker the information about students progress and attainment in a course.

A progress test has content validity if it measures the contents of the syllabus and the skills specified in the coursebook. Hence, we should take into consideration the learners’ needs and their particular domain of use to ensure content validity. It means that progress test includes content validity if it test the content based on syllabus and skills in the text book.

The aim of progress tests are:

1. “Testing tells teachers what students can or cannot do— in other words, tests show teachers how successful their teaching has been. It provides wash back for them to adjust and change course content and teaching styles where necessary.

2. Testing tells students how well they are progressing. This may stimulate them to take learning more seriously.

3. By identifying students’ strengths and weaknesses, testing can help identify areas for remedial work.

4. Testing will help evaluate the effectiveness of the program, course books, materials, and methods.”27

25

Carmen Perez Basanta, Coming to Grips with Progress Testing: Some Guidelines for Its Design, This Article Was First Published in Volume 33, No. 3, 1995, p. 37. Accessed on 05 April 2016

26

Valette, op. cit., p. 5. 27


(27)

15

From those types of test, the writer focused on the student achievement. This type of the test gives information because the achievement test is one of the end of the program in the instructional process, such as summative test.

3. Summative Test

Summative test is given at the end of term; it can provide information about student’s result and feedback after teaching and learning process.28

In addition, a summative test is given in a certain time; the aims of the assessments are measuring students’ knowledge, skill or performance. It can be used to measure whether the school program was success or not. It means that this type of assessment can be used for making judgments about student achievement and instructional effectiveness.

Furthermore, the summative test is designed to reflect the standardization that students have to reach in learning. In making summative test, the teacher measures “the sum” total of material covered by using the table of specification. So the result of the test can show the student’s achievement in learning.29

It means that the question for summative test must representation what has been taught so the result of test can be used for determining student achievement.

In addition, the summative test is administered at the end of program when a course is over and the result of the test can sum up the student’s experiences and achievement in learning.30

Besides, “the summative evaluation typically comes at the end of a course (or unit) of instruction. It is designed to determine the extent to which the instructional

28

Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Company, 1965), fifth edition, p. 13.

29

Tinambunan, op. cit., p. 9. 30

Desmond Allison, Language Testing and Evaluation, an Introduction Course, (Singapore: Singapore University Press, 1999), p. 65.


(28)

objectives have been achieved and is used primarily for assigning course grades or certifying pupil mastery of the intended learning outcomes”.31

According to Hanna and Dettmer in formative and summative assessment, “there are types of summative assessment:

1. Examinations (major, high-stakes exams)

2. Final examination (a truly summative assessment)

3. Term papers (drafts submitted throughout the semester would be a formative assessment)

4. Projects (project phases submitted at various completion points could be formatively assessed)

5. Portfolios (could also be assessed during its development as a formative assessment)

6. Performances

7. Student evaluation of the course (teaching effectiveness) 8. Instructor self-evaluation.”32

To sum up the summative test is administered at the end of the program. The material what is being test is taken from the material during teaching and learning process. The result of the summative test can be used to know the student’s achievement and progress in learning.

4. Characteristic of a Good Test

All good tests possess three qualities: validity, reliability, and practicality. That is to say, any test that we must be appropiate in terms of our objectives, dependable in the evidence it provides, and applicable to our particular situation. The test can be categorized as the good test, if the test has the certain qualifications or the certain characteristics. It is supported by Zainal Arifin in Evaluasi pembelajaran,

31

Gronlund, op. cit., p. 12. 32


(29)

17

the important characteristic of the good test can be classified into three main aspects, they are validity, reliability, and practicality.33

a. Validity

Validity is one of the criteria in a good test. Test validity presupposes about is to tested. J.B. Heaton points out: the validity of a test is measure what is supposed to be measure.34 It means that the validity is the extent to which a test measures what it is intended to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted.

As stated by Suharsimi: A test is valid if the test was able to accurately measure what is to be measured.35

The validity of a test must be considered in measurement in this case there must be seen whether the test used really measures what are supposed to measure. A test is categorized as valid test; if the test measure what should be measured.

Validity contains for types called content validy, concurrent validy, predective validity, and construct validity.

 Content validity is concerned with the total of sampling which is used for content. It means that

 Concurrent validity is concerned the relationship between the test score and the variable of the test that will be measured. It means that

33

Arifin, op. cit., p. 246. 34

Heaton, op cit., p. 159. 35

Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta: PT. Bumi Aksara, 2009), revision edition, p. 59.


(30)

 Predictive validity is concerned with the test score as a function to score the performance at certain time. It means that

 Constructive validity is an addition to measure the validity if three of measurement above is not ample to be measured.

b. Reliability

Reliability refers to the consistency of the test scores in which a test measures the same thing all the time. In other words, the reliability of a test refers to its consistency in which it yields the same rank for an individual taking the test for several time. To have confidence in a measuring instrument, we would need to be assured, for example, that approximately the same result would be obtained (1) if we tested a group on Tuesday instead of Monday. (2) if we gave two parallel forms of the test to the same group on Monday and Tuesday. (3) if we scored particular tets on Tuesday and Monday. (4) if two or more scorers scored the test independently. It is clear from the foregoing that two somewhat different types of consistency or reability are involved: reability of the test itself, and reability of the scoring of test. The writer concludes test is reliable if it consistently yields the same, or nearly the same rating over repeated administration.

According to Wilmar Tinambunan, “there are several ways of estimating the reliability of a test. The three basic methods and the type of information each provides are as follow:


(31)

19

1) Test-retest method, which indicates the stability of test scores over some given period of time.

It means that the simplest technique would be retest the same individuals with the same test. If the results of two administrations were highly corrected. We would assume that the test had temporal stability. If the time interval between two testing is relatively short. The examiners memories of their previous responses will make their two performances spuriously consistent and thus lead to over estimate of test reability. On the other hand, if the interval is so long as to menimaze the memory factor, the examiners’ proficiency may have undergone a genuine change, producing different responses to the same items, and thus the test reability could not underestimated.

2) Equivalent-forms method, which indicates the consistency of test scores over different forms of the test.

It means the second method of computing reability is with the use of alternate or parallel forms. That is, with different versions of the same test which are equvalent in length, difficulty, time limit, format, and other such aspects, where equvalent forms of a test exist, reliability can be increased by lenghtening the test provided always that the additional material is similar in quality and difficulty to the original. But the satisfactory reliability could be obtain only by lenghtening the test beyond all resonable limits, it would obviously be wiser to revise the material or choose another test type. this method is probably the best, but even here practice effect, through reduced, will not be entirely eliminated.

3) Internal consistency method, which indicates the consistency of test scores over different parts of the test.”36

It means A third method for estimating the reability of a test consist in giving a single administration of one form of the test, and then, by dividing the items into two halves (usually by separating the ood and even numbers item) obtaining two scores for each individual by such ‗split half procedures’ two forms the results of which may be compared to provide a measure of the adequency of the sampling. If

36


(32)

test scoring is done by two or more raters, the reliability of their evaluations can easily be checked by comparing the scores they give foir the same students response.

Both validity and reliability have aims: “To ensure that a point has been understood so that further teaching and learning can be proceed, to review material studied over several previous weeks in order to prepare students for a formal examination and to familiriarise students with particular types of test format.”37 Finally, it can be concluded that reliability refers to the purely and simply to the precision with which the test measures. No matter how high the reliability quotient. It is by no means a guarantee that the test measure what the test wants to measure. Data concering what the test measures must be sought from some source outside the statistics of the test itself.

c. Practicality

After validity and reliability, the writer wrote about as a good instrument, and the writer that a test may have covered two particular things. It is validity and reliability, but, the teacher or someone who makes the test should consider in its practical matters. In the implementation of classroom test, some of the basic problem in testing such as how much paper it needed? Is it time consuming? Is it costly/ how much does the cost? What about the place, etc.38 The context for the implementation of the test must be effective and efficient. Thus, it must be practicable. A thrid characteristics of a good test is its practicality. A test may be highly reable and valid instrument but still be beyond our means or facilities. Thus in the preparation of a new test or the adaption of an existing one, we must keep in mind a number of every practical consideration such as economy and ease at administering and scoring.

1. Economy

37

Allison, op. cit., p. 85. 38


(33)

21

As most educational administrations are very well aware, testing can be expensive. If a standart test is used, writer and reader into account the cost per copy, and whether or not the text books are reusable. Again. It should be determined whether several administrations and scorers will be needed for the most personnel who must be involved in scoring a test. The more costly the process becomes.

2. Ease at administeration and scoring

Other considerable of the practicality involve the ease which the test can be administered. A full, clear direction provides the test administration can perform his task quickly and effectively. It is supported by Suharsimi: “a test is categorized as high practicality if the test is practice and easy in administration.”39 A test is practicable if it doesn’t have a problem in its

implementation

Steps in Test construction and administration

According to Louis in Measurement and Evaluation in the Schools, “there are some steps in constructing and administering of test. The followings are:

1. “Get ready” stage

a. put the instructional material that is related to the test safely b. Make a guidelines related to the aims of test that will be

measured 2. “Get set” stage

a. Think over the type of test that will be used and the format of the test.

b. Arrage the preliminary study related to the items that will be used.

c. Consider the time will be needed to do the test.

d. Examine and check test items more than three times after writing the prilimenary draft.

39


(34)

e. Organize the test items. Count the total of difficulty item in test. Give the easiest questions first to build the poorer students confidence, interest and motivation in doing the test.

f. Give a clear instructions in test. g. Determine how to score the test. 3. “Go” stage

a. Consistent related to the time consuming in test.

b. Having a good classroom management while the students do testing.

c. Administration the test orderly.”40

To wrap up those idea above the practicality is one of factors that have to be considered in administering the test. And based on the explanation above the validity, reliablity and practicality are the characteristics of a good test. Meanwhile, there is another particular thing that must be considered by teacher and test constructor, the thing is the quality of the test items. The method to know the quality of test items is called Item Analysis.

B. Item Analysis

1. The Definition of Item Analysis

Selecting of appropiate languge testing is not enough it self to ensufe good test. Each question needs to function properly; otherwise, it can be weaken the exam. Fortunately, there are some rather simple statical ways of checking individual items. This procedure called “item analysis”. It is most often used with multiple- choice questions. An item analysis tells us basically three things: how difficult each item is, whether or not the question “discriminates” or tells the difference between high and low students, and which distractors are working as they should. An analysis like that is used with any important exam. For example, review tests and tests given at the end of a school term or course.

40

Louis J. Karmel, Measurement and Evaluation in the Schools, (New York: The Macmillian Company, 1970), p. 380.


(35)

23

Upper group Item analysis is usually done to select which items will remain on future revised and improved version of test. There are several descriptions about item analysis. An item analysis can be used for classifying the good or bad item and why that item is classified as a good or bad item. By doing item analysis, it can be classified into:

- Difficulty level of item. - Discriminating power.

- Alternative option of answering the question.

Before calculating discriminting power, the scores are made into three groups, they are:

- Upper group - Middle group - Lower group

According to Ngalim Purwanto, an item analysis has two purposes the followings are:

1. Item analysis is used for diagnostic information, it means it can used for knowing the progress failure and solution in learning.

2. Item analysis is used for reviewing the result of the test and deciding for the next test.41

It is supported in Arikunto, the aim of item analysis are identifying the poor, satisfactory, good, and excellent item. Item analysis give information about item test that needs of review and further instruction.42 It means that item analysis are listed according to their degrees of difficulty (easy, medium, or hard) and descrimination (good, fair, poor) these distributions provide a quick overview of the test.

In addition, according to Brown and Hudson in Criterion-referenced Language Testing that: item analysis is a procedure to evaluate the effectiveness of test. The

41

M. Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: PT. Remaja Rosdakarya, 2004), p. 118.

42

Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta: PT. Bumi Aksara, 2006), revision edition, p. 206.


(36)

aim of item analysis is determining in which items should be revised. It’s used to evaluating the test for next test.43 It means that it can be used to identifying items which are not performing well and which can perhaps be improved or discarded.

From those definitions, it can be concluded that item analysis is the process of collecting information about pupil’s responses to the items and getting the quality of test items. More specific, item analysis information can tell us if an item was too easy or too hard, how well it discriminated between high and low scores on the test and whether all of the alternatives function as intended. Item analysis data also aids in detecting specific technical flaws and thus futher provides information for improving the test items.

2. Difficulty Level

Item difficulty level of a test has ability to separate which item has difficulty to answered by students. If the test is used again with another class, don’t use items that too difficult or too easy. Rewrite them or discard them. Two or three very easy item can be placed at the beginning of the test to encourage students. Question should be arranged difficulty. Not only it is good psychology, but also helps those who don’t have a chance to finish the test; at least they have a chance to try those items that that they are most likely to get right. It is obvious that our sample item would come near the end of the test, since only a third of students got right.

Item difficulty may be defined as the proportion of the examinees that marked the item correctly. Item difficulty is the percentage of students that correctly answered the item, also referred to as the p-value.44

The following formula is used to find difficulty level. DL= Ru+Rl/Nu+Nl

Where,

43

James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (UK: Cambridge University Press, 2002), p. 113.

44

C. Boopathiraj and DR. K. Chellamani, Analysis of Test Items on Difficulty Level and Discrimination Index in the Test for Research in Education, accessed on 05 April 2016, p. 90, International Journal of Social Science & Interdisciplinary Research, ISSN 2277 3630, Vol.2 (2), FEBRUARY (2013). Online available at indianresearchjournals.com.


(37)

25

Ru = the number students in the upper group who responded correctly Rl = the number students in the lower group who responded correctly Nu = Number of students in the upper group

Nl = Number of students in the lower group

As a conclusion Difficult items may simply be points that you have not spent enough class time on or that you have not presented clearly enough. Adjusting your instruction could result in an appropriate level of difficulty for the item.

3. Discriminating Power

Item discriminatory power of a test has ability to separate good students from poor students. These students groups by their scores on the test as whole. The difference between the percentage of the top scoring 27% and bottom scoring 27% of students get the item right in its discrimination index.45 It means that discriminating power of a test item is an ability of the test item to differentiate between students who have high achievement and low achievement or separate the ‗good’ students from the ‗bad’ ones. The discrimination index of an item indicates the extent to which the item discriminates between the testees, separating the testees from the less able.

Sudijono stated: “Daya pembeda item adalah kemampuan suatu butir item tes hasil belajar untuk dapat membedakan antara testee yg berkemampuan tinggi (pandai) dengan testee yang berkemampuan rendah (bodoh).”46 It refers to the degree to which a score varies with trait level, as well as the effectiveness of this score to distinguish between respondents with a high trait level and respondents with a low trait level.

45

William Wiersma and Stephen G. Jurs, Evaluation of Instruction in Individually Guided Education, (Toledo: The Sears-Roebuck Foundation, 1976), p 46.

46

Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. Raja GrafindoPersada, 2011), p. 385.


(38)

The discrimination index can range from -1 to +1, when a large proportion of students in the lower group got the item right more than those in the upper group, it discriminates negatively.47

Item discriminating power can be obtained by subtracting the number of students in the lower group who got the item right (U) from the number of students in the upper group who got the item right (L) and dividing by the total number of students in one group included in the item analysis (N). Discrimination index is symbolized in D or DP (Discrimination Power).

To count the discriminating power, the formula is as below: DP = Correct U – Correct L

n Explanation:

DP: Discriminating Power

U : Sum of the students from the upper group who answered correctly L : Sum of the students from the lower group who answered correctly N : Number of the candidates in one group

The result is interpreted by using the criteria :

If DP is 0,20 to 1,00 = Good (satisfactory, good and excellent) ≤ 0.20 = Revise (poor)

Negative value on D = Discard or rewrite (very poor).48

Stated by Sumarna, summarized in formula form, as below: DI = U – L

N

Where:

DI = the index of discriminating power

U = the number of pupils in the upper group who answered the item correctly

47

Sumarna Surapranata, Analisis, Validitas, Reliabilitas dan Interpretasi Hasil Tes “Implementasi Kurikulum 2004”, (Bandung: PT. Remaja Rosdakarya, 2006), p. 23.

48


(39)

27

L = the number of pupils in the lower group who answered the item correctly N = number of pupils in each of the groups.49

The classifications of the index of discriminating power (D) are: DI = 0.07 – 1.00 = Excellent

0.40 – 0.70 = Good 0.20 – 0.40 = Satisfactory ≤ 0.20 = Poor

Negative value on D = Very Poor.50

By using the formula of DP and the criteria above, teachers can select the items which are fulfilled to that standardizations or not, so he/he can determine to use, revise or skip the item for the future.

In this skripsi, the writer is focusing on testing procedure. Acoording to explanation above, there are two kinds of achievement tests which are held periodically, they are progress and the final test.

49

Surapranata, op. cit., p. 31. 50


(40)

28 A. Place and Time of the Research

The research was conducted at “SMP Al Wathoniyah 09” East Jakarta. This is

located at JL. Raya Penggilingan No. 99 Cakung, East of Jakarta. The summative test was held on June 10, 2014. The writer did the research in June, 9th 2015. The writer took the English summative test papers and the students’ score of second grade period of 2014/2015 to be analyzed.

B. Technique of Sample Taking

The writer took the sample from students of “SMP Al Wathoniyah 09 East Jakarta”. The total number of second year students was 146 students; those were divided into 3 classes. The writer used technique systematical sampling random. The sample was chosen based on certain sequences by using ordinal.1 In determining the sample the writer used multiplied by 3 in all classes. So the sample of this research was 49 students. After that the writer divided the students into three groups; they were upper group, middle group and lower group. Next the writer took upper and lower group only to be analyzed.

C. Technique of Data Collecting

To collect the data that relate to the topic of discussion, the writer took the summative test from SMP Al Wathoniyah 09 East Jakarta in order to analyze the discriminating power and to categorize discriminating power as excellent, good, satisfactory, poor, and very poor in every item.

1


(41)

29

D. Research Instrument

a. Students’ score of the English summative test in which the English teacher made.

b. English summative test of the second year students of “SMP Al Wathoniyah09 East Jakarta” academic year 2013/2014.

E. Technique of Data Analysis

In this research, the writer used quantitative method to analyze the discriminating power of English summative test items of second year of “SMP Al Wathoniyah 09 East Jakarta” quantitative method means the investigation relies on statistical analysis (mathematical analysis) of the data which is typically in numeric form.2 And to analyze the discriminating power the writer used a statistic formula, namely, the discriminating power index:

DP = Correct U – Correct L n

Explanation:

DP: Discriminating Power

U : Sum of the students from the upper group who answered correctly L : Sum of the students from the lower group who answered correctly N : Number of the candidates in one group.

The classifications of the index of discriminating power (D) are:

DI = 0.70 – 1.00 = Excellent

0.40 – 0.70 = Good

0.20 – 0.40 = Satisfactory

≤ 0.20 = Poor

Negative value on D = Very Poor.3

2

John W. Creswell, Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research, (Boston: Pearson Education. Inc., 2012), 4th edition, p. 15.

3

Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. RajaGrafindo Persada, 2011), p. 389.


(42)

30 A. Description of Data

The data which is used by the writer is the English summative test in the second grade of “SMP Al Wathoniyah 09” East Jakarta. This English

summative test was held on Tuesday, June 10th 2014, that must be finished in 120 minutes. The total numbers of test items are 50 questions, which all of them are multiple choices.

The total numbers of students that took part in this analysis are 48 students.

Anas Sudijono in the book “Pengantar Evaluasi Pendidikan” demonstrated

that the selection of criterion groups “based upon the upper 27 percent and lower 27 percent of the provided by the greatest confidence that the upper group is superior in the trait measured by the test as compared to the lower group.”1 The middle 46 percent of the papers is not used when 27 percent in the upper and 27 percent in the lower groups are employed in item analysis. Based on the statement, the writer classified the students into three groups; upper, middle and lower group. The writer took only 27% of the lower group and 27% of the upper group for this analysis. And the rest students that belong to the middle group will not take part to this analysis. The next table is the students’ scores and group position in English summative test.

1

Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. RajaGrafindo Persada, 2011), p. 387.


(43)

31

Table 4.1

The Students’ Scores and Group Position of English Summative Test In the Second Semester

No Name Score Explanation

1 Number One 6.40

U P P E R G R O U P

2 Number Two 6.40

3 Number Three 6.20

4 Number Four 6.20

5 Number Five 6.00

6 Number Six 5.80

7 Number Seven 5.60

8 Number Eight 5.20

9 Number Nine 5.00

10 Number Ten 5.00

11 Number Eleven 5.00

12 Number Twelve 5.00

13 Number Thirteen 4.80

14 Number Fourteen 4.80

M I D D L E G R O U P

15 Number Fifteen 4.80

16 Number Sixteen 4.60

17 Number Seventeen 4.60

18 Number Eighteen 4.60

19 Number Nineteen 4.60

20 Number Twenty 4.60

21 Number Twenty One 4.60

22 Number Twenty Two 4.40

23 Number Twenty Three 4.40

24 Number Twenty Four 4.40

25 Number Twenty Five 4.40

26 Number Twenty Six 4.40

27 Number Twenty Seven 4.20 28 Number Twenty Eight 4.20

29 Number Twenty Nine 4.20

30 Number Thirty 4.20

31 Number Thirty One 4.00

32 Number Thirty Two 4.00

33 Number Thirty Three 4.00


(44)

35 Number Thirty Five 4.00

36 Number Thirty Six 4.00

L O W E R G R O U P 37 Number Thirty Seven 3.80

38 Number Thirty Eight 3.80

39 Number Thirty Nine 3.80

40 Number Forty 3.80

41 Number Forty One 3.80

42 Number Forty Two 3.80

43 Number Forty Three 3.60

44 Number Forty Four 3.40

45 Number Forty Five 3.40

46 Number Forty Six 3.40

47 Number Forty Seven 3.00

48 Number Forty Eight 3.00

Table 4.1 shows that students who are taking the test are classified into 3 groups; upper group, middle group and lower group. The writer took 27% or 13 students from upper and lower group to be analyzed. The highest score in upper group is gained by one student in score 6.40. The lowest score in upper group is gained by one student in score 4.80. Meanwhile the highest score in lower group is gained by one student in the same score 4.00. So, the lowest score in lower group is gained by one student in score 3.00.


(45)

33

From the table 4.2 in the appendix page 47, it can be concluded that the responses of each item of upper group students in their English test are:

1. There are 13 students who answered the item no 27 and 33 correctly. 2. There are 12 students who answered the item no 9, 12, 17, 21, 34 and 36

correctly.

3. There are 11 students who answered the item no 3, 24, 30, 43 and 45 correctly.

4. There are 10 students who answered the item no 7, 8, 13, 26, 29 and 47 correctly.

5. There are 9 students who answered the item no 1, 10 and 39 correctly. 6. There are 8 students who answered the item no 46 correctly.

7. There are 7 students who answered the item no 2, 22 and 49 correctly. 8. There are 6 students who answered the item no 5, 6, 16, 35, 37 and 40

correctly.

9. There are 5 students who answered the item no 15, 18 and 19 correctly. 10. There are 4 students who answered the item no 11 and 28 correctly.

11. There are 3 students who answered the item no 4, 14, 20, 23, 31, 32 and 38 correctly.

12. There is 1 student who answered the item no 25, 42, 44, 48 and 50 correctly.


(46)

From the table 4.3 in the appendix page 49, it can be concluded that the responses of each item of lower group students in their English test are:

1. There are 11 students who answered the item no 30 correctly. 2. There are 10 students who answered the item no 12 correctly.

3. There are 9 students who answered the item no 1, 24, 33 and 36 correctly. 4. There are 8 students who answered the item no 2, 7, 17, 21, 29 and 39

correctly.

5. There are 7 students who answered the item no 3, 8, 15, 20, 27, 34 and 43 correctly.

6. There are 6 students who answered the item no 9, 22 and 47 correctly. 7. There are 5 students who answered the item no 26, 40, 45, 46 and 49

correctly.

8. There are 4 students who answered the item no 4, 10, 13, 37 and 44 correctly.

9. There are 3 students who answered the item no 5, 6, 32 and 50 correctly. 10. There are 2 students who answered the item no 11, 14, 16, 18, 25, 38 and

41 correctly.

11. There is 1 student who answered the item no 23, 28, 31, 42 and 48 correctly.


(47)

35

Before the writer analyzes the data, the writer has calculated the data into statistic calculation. The writer used Discrimination Index Formula to find the discriminating power criteria of English summative test. The table is as follows:

Table 4.4

The Discriminating Power Index of the Upper and Lower Group Item

Number

Total Correct Answer U - L DI = U - L Remark

Upper Group

Lower Group

1 9 9 0 0.00 Poor

2 7 8 -1 0.08 Poor

3 11 7 4 0.31 Satisfactory

4 3 4 -1 0.08 Poor

5 6 3 3 0.23 Satisfactory

6 6 3 3 0.23 Satisfactory

7 10 8 2 0.15 Poor

8 10 7 3 0.23 Satisfactory

9 12 6 6 0.46 Good

10 9 4 5 0.38 Satisfactory

11 4 2 2 0.15 Poor

12 12 10 2 0.15 Poor

13 10 4 6 0.46 Good

14 3 2 1 0.08 Poor

15 5 7 -2 -0.15 Very poor

16 6 2 4 0.31 Satisfactory

17 12 8 4 0.31 Satisfactory

18 5 2 3 0.23 Satisfactory

19 5 0 5 0.38 Satisfactory

20 3 7 -4 -0.31 Very poor

21 12 8 4 0.31 Satisfactory

22 7 6 1 0.08 Poor

23 3 1 2 0.15 Poor

24 11 9 2 0.15 Poor

25 1 2 -1 -0.08 Very poor

26 10 5 5 0.38 Satisfactory

27 13 7 6 0.46 Good

28 4 1 3 0.23 Satisfactory

29 10 8 2 0.15 Poor

30 11 11 0 0.00 Poor


(48)

31 3 1 2 0.15 Poor

32 3 3 0 0.00 Poor

33 13 9 4 0.31 Satisfactory

34 12 7 5 0.38 Satisfactory

35 6 0 6 0.46 Good

36 12 9 3 0.23 Satisfactory

37 6 4 2 0.15 Poor

38 3 3 0 0.00 Poor

39 9 9 0 0.00 Poor

40 6 9 -3 -0.23 Very poor

41 0 2 -2 -0.15 Very poor

42 1 1 0 0.00 Poor

43 11 7 4 0.31 Satisfactory

44 1 4 -3 -0.23 Very poor

45 11 5 6 0.46 Good

46 8 5 3 0.23 Satisfactory

47 10 6 4 0.31 Satisfactory

48 1 1 0 0.00 Poor

49 7 5 2 0.15 Poor


(49)

37

Based on the data above, the percentage of discriminating power English summative test is:

Table 4.5

The percentage of Discriminating Power

No. Discriminating Power

Total Item

Percentage Item Number

1 Excellent 0 0% 0

2 Good 5 10% 9, 13, 27, 35, 45

3 Satisfactory 18 36% 3, 5, 6, 8, 10, 16, 17,18,

19, 21, 26, 28, 33, 34, 36, 43, 46, 47

4 Poor 20 40% 1, 2, 4, 7, 11, 12, 14, 22,

23, 24, 29,30, 31, 32, 37, 38, 39, 42, 48, 49

5 Very poor 7 14% 15, 20, 25, 40, 41, 44,

50

The table above shows that there are 0 test item (0%) is categorized into excellent test item, which is no showed by the test item number. It is categorized as excellent test item because its discriminating index is in range between 0.70 – 1.00. There are 5 test items (10%) are categorized into good items, that range from 0.40 – 0.70, they are test item number 9, 13, 27, 35 and 45. There are 18 test items (36%) are categorized as satisfactory test items for their discriminating index are in range 0.20 – 0.40, they are test items number 3, 5, 6, 8, 10, 16, 17, 18, 19, 21, 26, 28, 33, 34, 36, 43, 46 and 47.

Meanwhile, 20 test items (40%) are categorized into poor test items because their discriminating index are range in 0.00 – 0.20, they are test items number 1, 2, 4, 7, 11, 12, 14, 22, 23, 24, 29,30, 31, 32, 37, 38, 39, 42, 48 and 49. At last, there are 7 test items (14%) are categorized as very poor item as their discriminating index are range in negative values, they are test items number 15, 20, 25, 40, 41, 44 and 50.


(50)

B. Data Analysis

In analyzing the discriminating power of the data, the writer listed the students’ responses of each number of the test firstly. The list can be seen in the table 4.2 and table 4.3 of this skripsi”.

Then the next step is to make a format of item analysis. This format is calculated the data from the groups are upper and lower group in table 4.4. The last step is to count into statistic calculation from the discriminating power formula:

DP = Correct U – Correct L N

Explanation:

DP = Discriminating Power

U = Sum of the students from the upper group who answered correctly L = Sum of the students from the lower group who answered correctly N = Number of the candidates in one group

The result of this last step can be seen also in the table 4.4. In this table, result of each item will be in decimal then the writer categorized each item according to this formula:

The classifications of the index of discriminating power (DI) or discriminating power are:

DI = 0.07 – 1.00 = Excellent 0.40 – 0.70 = Good 0.20 – 0.40 = Satisfactory ≤ 0.20 = Poor

Negative value on D = Very Poor

The result is interpreted by using the criteria: If DP is 0.20 to 1.00 = Good

≤ 0.20 = Revise


(51)

39

Based on the data of item analysis result in discriminating power above, the writer can conclude that from 50 items:

1. There are 5 numbers (10%) which are categorized into good test items, that range is 0.46 and according to Anas Sudijono in Pengantar Evaluasi Pendidikan that teacher should be entered or recorded in the bank’s book about achievement test.

2. There are 18 numbers (36%) which are categorized as satisfactory test items because their discriminating index are in range 0.23 – 0.38, the test item test should be entered or saved.

3. There are 20 numbers (40%) which are categorized into poor test items because their discriminating index range in 0.00 – 0.15, the item test can be revise.

4. There are 7 test items (14%) which are categorized as very poor test item as their discriminating index range in negative values that -0.08 – -0.31, the item test should be discard or rewrite. 2

C. Data Interpretation

For whole items, the writer can interpret that the discriminating power of English summative test prepared by Government Service of East Jakarta or

Dinas Jakarta Timur , tested at the seventh grade of “SMP Al Wathoniyah 09” East Jakarta. The analyzing result from the summative test that the test, the writer has found test item that category belongs to poor discriminating power, because there are 20 test items can not differentiate the upper and the lower group.

In the theory by Anas Sudijono in “Pengantar Evaluasi Pendidikan” that

the classification criteria of DP is 0.20 to 1.00 is good (satisfactory, good, and excellent) that the test item from summative test of SMP Al Wathoniyah 09

East Jakarta is good because there are a summation of the criteria from good and satisfactory to 46% (good = 10% and satisfactory = 36%), is more than from criteria of poor (40%).

2


(52)

One of the items has good discriminating power is an item no 9, because of 12 students from the upper group who answered the item no 9 correctly and only 6 students from the lower group who answered the item no 9 correctly. So, the item can tell as the good item because test item can differentiate the upper and the lower group and the discriminating power 0.46. Then, the item has satisfactory discriminating power is an item no 3, because of 11 students from the upper group who answered the item no 3 correctly and only 7 students from the lower group who answered the item no 3 correctly. So, the item can tell as the satisfactory item because test item can differentiate the upper and the lower group and the discriminating power 0.31. Next, one of the items has poor discriminating power is an item no 1 as the poor item because test item can not differentiate the upper and the lower group or same and the discriminating power 0.00. The last, the item has very poor discriminating power is an item no 15 because of 7 students from the lower group who answered the item no 15 correctly rather than 5 students from the upper group who answered the item no 15 correctly. So, the item can tell as the very poor item because the item cannot differentiate the upper and the lower group and the discriminating power -0.15 (negative).

From the analysis data above, it can be conclueded that the English Summative Test at the seventh grade of SMP Al Wathoniyah 09 East Jakarta has a good discriminating power so that the question from the summative test should be saved or can be used to next test.


(53)

41

CHAPTER V

CONCLUSION AND SUGGESTION

A. Conclusion

Based on the analysis and the interpretation in previous chapter, the writer would like to conclude that the English summative test which tested at seventh grade of “SMP Al Wathoniyah 09” East Jakarta, can be categorized

into 4 different range of discriminating power. First, they are 5 test items (10%) that is categorized into good test items. Then, they are 18 test items (36%) that is categorized into satisfactory test items. Next, they are 20 test items (40%) that is categorized into poor test items. And the last, they are 7 test items (14%) that is categorized into very poor test items.

So, there are 23 test items (46%) or from the categorized good and satisfactory test items of English summative test regarded as a good discriminating power that range from 0.23 – 0.46 and it can be used for the next test. Meanwhile, 20 test items (40%) are needed to be revised because the test items had the criteria poor test items that range from 0.00 – 0.15. And the last criteria is very poor test items of discriminating power have to be eliminated or discard because 7 test items (14%) have the range from -0.08 – -0.3.

From the explanation above, the writer concludes that the English summative test which is tested at seventh grade of “SMP Al Wathoniyah 09

East Jakarta has good discriminating power, because 23 test items (46%) of the tests items have the criteria of a positive discriminating power which range from 0.23 – 0.46.


(54)

B. Suggestion

Based on the result of the research, there are some suggestions that can be given in relation to the writer’s conclusion. The suggestions are as follows: 1. Government Service of East Jakarta or Dinas Jakarta Timur who test

made can provide feedback to the next test, so that the tests will not poor and very poor criteria of tests.

2. Teachers should check the questions that will be provided to students before the final test conducted.

3. Teachers should save the excellent, good and satisfactory criteria of test and learn how to make the good of the test for next test and can be used by the teachers for the future evaluation.


(55)

43

BIBLIOGRAPHY

Allison, Desmon. Language Testing and Evaluation, an Introduction Course.

Singapore: Singapore University Press, 1999.

Arifin, Zainal. Evaluasi Pembelajaran. Bandung: PT. RemajaRosdakarya, 2013. Arikunto, Suharsimi. Dasar-dasar Evaluasi Pendidikan. Edisi Revisi.Jakarta: PT.

Bumi Aksara, 2006.

Arikunto, Suharsimi. Dasar-dasar Evaluasi Pendidikan, Edisi Revisi. Jakarta: PT. Bumi Aksara, 2009.

Baumgartner, Ted A., et.al., Measurement for Evaluation. New York: McGraw-Hill, 2007.

Boopathiraj, C and DR. K. Chellamani, Analysis of Test Items on Difficulty Level and Discrimination Index in the Test for Research in Education, accessed on 05 April 2016, p. 90, International Journal of Social Science & Interdisciplinary Research, ISSN 2277 3630, Vol.2 (2), FEBRUARY (2013). Online available at indianresearchjournals.com.

Brown, H. Dauglas. Teaching by Principles an Interactive Approach to Language Pedagogy.2nd Edition. New York: Addison Wesley Longman, Inc., 2001. Creswell, John W. Educational Research: Planning, Conducting, and Evaluating

Quantitative and Qualitative Research. 4th Edition. Boston: Pearson Education. Inc., 2012.

Davies, Alan et.al. Dictionary of Language Testing. UK: University of Melbourne, 1990.


(56)

Dean, James Brown and Thom Hudson. Criterion-referenced Language Testing. UK: Cambridge University Press, 2002.

Gronlund, Norman E. Measurement and Evaluation in Teaching. 5th Edition. New York: Macmillan Publishing Company, 1965.

Gronlund, Norman E. Measurement of Evaluation in Teaching. 4th Edition. New York: Macmillan Publishing Co., Inc.,1981.

Gronlund, Norman E. and Robert L. Linn. Measurement and Evaluation in Teaching. New York: Macmillan Publishing Company, 1990.

Hanna and Dettmer, Northern Illinois University, Faculty Development and Instructional Design Center, Formative and Summative Assessment, 2004, p. 1, (facdev@niu.edu http//facdev.niu.edu, 815.753.0595/ accessed on 01 December 2015.

Heaton, J. B. Writing English Language Tests. London: Longman, 1988.

Hopkins, Charles D. and Richard L. Antes. Classroom Measurement and Evaluation.3rd Edition. USA: F.E. Peacock Publishers, Inc., 1990.

Hughes, Arthur. Testing for Language Teachers. 2nd Edition. United Kingdom: The Press of the University of Cambridge, 2003.

Karmel, Louis J. Measurement and Evaluation in the Schools. New York: The Macmillian Company, 1970.

Lyman, Howard B. Test Scores and What They Mean. 6th Edition. United States: Allyn and Bacon, 1998

McNamara, Tim. Language Testing. New York: Oxport University Press, 2000. Ngalim, M. Purwanto. Prinsip-prinsip dan Teknik Evaluasi Pengajaran.


(57)

45

Ngalim, M. Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran. Bandung: RamadjaKarya CV, 1986.

Ngalim, M. Purwanto. Prinsip-prinsip dan Teknik Evaluasi Pengajaran. Bandung: PT. Remaja Rosdakarya, 2004.

Nitko, Anthony J. Education Tests and Measurement: An Introduction. USA: Marcourt Brace Jovanovch, Inc., 1983.

Nunnally, Jum C. Educational Measurement and Evaluation. 2nd Edition. New York: McGraw-Hill Book Company, 1972.

Perez, Carmen Basanta. Coming to Grips with Progress Testing: Some Guidelines for Its Design, This Article Was First Published in Volume 33, No. 3, 1995, p. 37. Accessed on 05 April 2016.

Ridwan, Mohamed Osman. Educational Evaluation and Testing, 2010, p. 51, (http://en.wikipedia.org/wiki/Creative_Commons. P. 52). Accessed on 05 April 2016.

Sudijono, Anas. Pengantar Evaluasi Pendidikan. Jakarta: PT. Raja Grafindo Persada, 2006.

Sudijono, Anas. Pengantar Evaluasi Pendidikan. Jakarta: PT. RajaGrafindo Persada, 2011.

Sukardi, H.M. “Evaluasi Pendidikan” Prinsip dan Operasionalnya. Jakarta: PT. BumiAksara, 2011.

Surapranata, Sumarna. Analisis, Validitas, Reliabilitas dan Interpretasi Hasil Tes

“Implementasi Kurikulum 2004”. Bandung: PT. Remaja Rosdakarya, 2006.

Syah, Muhibbin. Psikologi Belajar. Jakarta: PT. RajawaliPers, 2012.

Tinambunan, Wilmar Evaluation of Student Achievement. Jakarta: PPLPTK, 1988.


(58)

Usman, Husaini. Pengantar Statistika. Jakarta: PT. Bumi Aksara, 2006.

Wiersma, William and Stephen G. Jurs. Evaluation of Instruction in Individually Guided Education. Toledo: The Sears-Roebuck Foundation, 1976.

Valette, Rebecca M. Modern Language Testing.2nd Edition. New York: Harcourt Brace Jovanovich, Inc., 1977.


(59)

(60)

(61)

(62)

(63)

(64)

(65)

(66)

(1)

(2)

(3)

(4)

(5)

(6)