An item analysis on discriminating power of english summative test ( a case study of second year of SMPN 87 Pondok Pinang)

(1)

AN ITEM ANALYIS ON DISCRIMINATING POWER

OF ENGLISH SUMMATIVE TEST

(A Case Study of Second Year of “SMPN 87” Pondok Pinang)

A “Skripsi”

Presented to Faculty of Tarbiya and Teachers Training in Partial Fulfillment of the Requirements

for the Degree of S.Pd. (Bachelor of Arts) in English Language Education

By:

HIKMAH LESTARI 105014000379

By:

HIKMAH LESTARI

105014000379

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERITY

JAKARTA

2011

(2)

A “Skripsi”

Presented to Faculty of Tarbiyaand Teachers‟ Training in Partial Fulfillment of the Requirements

for the Degree of S.Pd. (Bachelor of Arts) in English Language Education

Approved by the advisor:

Dr. Atiq Susilo, MA NIP: 19491122 197803 1 001

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYA

AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERITY

JAKARTA

(3)

ENDORSEMENT BY THE EXAMINATION COMMITTEE

The examination committee of the Faculty of Tarbiya and Teachers Training certifies that the „Skripsi” entitled: “AN ITEM ANALYIS ON DISCRIMINATING POWER OF

ENGLISH SUMMATIVE TEST”, written by Hikmah Lestari, student‟s registration

number 105014000379 was examined at examination session of the Faculty of Tarbiyah

and Teachers‟ Training, “SyarifHidyatullah” State Islamic University Jakarta on March, 31th, 2011. The “Skripsi” has been accepted and declared to have fulfilled one of the requirements for the degree of S.Pd. (Bachelor Art) in English Education in The Department of English Education.

Jakarta, March 31th, 2011 Examination Committee:

CHAIRMAN : Drs. Syauki, M.Pd. (……….)

NIP. 19641212 199103 1 002

SECRETARY : NenengSunengsihS.Pd. (……….)

NIP. 19730625 199903 2 001

EXAMINER 1 : Dr. Fahriany, M.Pd. (……….)

NIP. 19700611 199101 2 001

EXAMINER 2 : Drs. BahrulHasibuan, M.Ed. (……….)

Acknowledge by:

Dean of Tarbiya and Teachers Training Faculty

Prof. Dr. DedeRosyada, M.A. NIP. 19571005 198703 1 003

(4)

i ABSTRACT

Lestari, Hikmah. 2011, An Item Analysis on Discriminating Power of English Summative Test at the Second Grade Students of SMPN 87 Pondok Pinang, “Skripsi”, Department of English Education, The Faculty of

Tarbiya and Teachers’ Training, Syarif Hidayatullah State Islamic University Jakarta.

Advisor: Dr. Atiq Susilo, MA

Key words: Item Analysis, Discriminating Power, SMPN 87

The purpose of this study is to analyze the discriminating power of English

Summative test at second grade of “SMPN” 87 Pondok Pinang. Through this study, it is hoped that the teacher can get clear description about the quality of discriminating power of English summative, so the teacher is able to help the poor students.

This study is categorized as descriptive analysis; because it is intended to describe the objective condition about the discriminating power of students’ English summative test at odd semester of second grade of “SMPN 87” Pondok Pinang by analyzing the quality of English summative test items in discriminating

students’ achievement. This study is considered as quantitative research, because the researcher used some numerical data which is analyzed statistically.

The finding of this study is that the English summative test which is tested

at second grade of “SMPN 87” Pondok Pinang has good discriminating power,

because 35 items ranging from 0.25 until 0.75 (70%) of the test items have fulfilled the criteria of a positive discriminating power.

(5)

ii ABSTRAK

Lestari, Hikmah. 2011, An Item Analysis on Discriminating Power of Engish Summative Test at the Second Grade of SMPN 87 Pondok Pinang, Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu TArbiyah dan Keguruan, UIN Syarif Hidayatullah Jakarta.

Pembimbing: Dr. Atiq Susilo, MA

Key words: Analisa Butir Soal, Daya Pembeda, SMPN 87

Tujuan dari penelitian ini adalah untuk menganalisis daya pembeda dari tes sumatif bahasa Inggris kelas dua SMPN 87 Pondok Pinang. Melalui penelitian ini, diharapkan guru mendapatkan penjelasan secara jelas tentang kualitas daya pembeda tes summatif bahasa Inggris, sehingga guru dapat membantu siswa-siswa yang mendapatkan nilai rendah.

Penelitian ini dikategorikan sebagai analisis deskriptif; karena penelitian ini menggambarkan kondisi objektif daya pembeda tes summative bahasa Inggris semester ganjil siswa kelas dua SMPN 87 Pondok Pinang dengan menganalisis kemampuan butir – butir soal pada tes summative bahasa Inggris dalam membedakan kemampuan para siswa. Penelitian ini termasuk penelitian kuantitatif, karena peneliti menggunakan beberapa data penghitungan yang dianalisa dengan statistik.

Hasil temuan penelitian ini menyatakan bahwa tes sumatif bahasa Inggris yang diuji pada kelas kedua SMPN 87 Pondok Pinang memiliki daya pembeda yang baik, karena 35 item dengan kategori 0.25 sampai 0.75 atau 70% dari item tes telah memenuhi kriteria dengan daya diskriminasi positif.

(6)

(7)

Mrs. Dra. Farida Hamid, M.Pd., Mr. Prof. Dr. Mulyanto Sumardi, MA., The late Mr. Drs. Munir Sonhaji, M.Ed., Mr. Drs. Nasifuddin Djalil, M.Ag.,

Mr. Drs Arifin Toy, M.Sc., Mr. Drs. M. Zaenuri, M.Pd., and Mr. Drs Nasrun Mahmud, M.Ed.

The writer also would like to express her gratitude to Mr. Dr. Atiq Susilo,

MA as the writer’s advisor who had kindly spent his time to give his valuable advice, guidance, correction, and suggestion in finishing this “skripsi”

His gratitude also goes to Mr. Drs. Syauki, M.Pd as the head of English Education Department, Mrs. Neneng Sunengsih S.Pd as the secretary of English Education Department, and Prof. Dr. Dede Rosyada, MA as the Dean of Faculty

of Tarbiya and Teachers’ Training.

The writer dedicated many thanks to Mr. Drs. Ishak Idrus as the

headmaster of “SMPN” 87 Pondok Pinang, who had given the permission to the writer to do the research at “SMPN” 87 Pondok Pinang.

The writer also would like to express her gratitude and love to all beloved classmates (2005) of English Education Department, either class A, B, or C, for

sharing their knowledge, support, and time in accomplishing this ”skripsi” and for

the wonderful friendship while studying together. May Allah the Almighty bless the all, so be it.

Finally, the writer realizes that this “skripsi” is still far from being perfect. Constructive criticism and suggestion would be welcomed to make it better.

Jakarta, March 31th

, 2011

The writer

(8)

TABLE OF CONTENTS

ABSTRACT ... i

ABSTRAK ... ii

ACKNOWLEDGEMENT ... iii

TABLE OF CONTENTS ... v

LIST OF TABLES ... vii

LIST OF APPENDICES ... viii

CHAPTER I: INTRODUCTION A. The Background of the Study ... 1

B. The Limitation of the Problem ... 3

C. The Formulation of the Problem ... 4

D. The Objective of Study ... 4

E. The Method of Study ... 4

F. The Organization of the Writing ... 5

CHAPTER II: THEORETICAL FRAMEWORK A. Test 1. The Definition of Test ... 6

2. Types of Test ... 7

3. Characteristic of a Good Test ... 11

B. Item Analysis 1. The Definition of Item Analysis ... 13

2. Discriminating Power ... 14

(9)

4. The Importance of Item Analysis ... 18

CHAPTER III: RESEARCH METHODOLOGY A. Place and Time of Research ... 20

B. Technique of Sample Taking ... 20

C. Technique of Data Collecting ... 21

D. Research Instrument ... 21

E. Technique of Data Analysis ... 21

CHAPTER IV: RESEARCH FINDINGS A. Description of Data ... 23

B. Analysis of Data ... 36

C. Interpretation of Data... 37

CHAPTER V: CONCLUSION AND SUGGESTION A. Conclusion ... 38

B. Suggestion ... 39

BIBLIOGRAPHY ... 40

(10)

vii

LIST OF TABLES

Table 4.1 The Students’ Score and Group Position of English Summative Test at the Odd Semester... 25 Table 4.2 The Students’ Answer Sheet of English Summative Test Items

from the Upper Group ... 27 Table 4.3 The Students’ Answer Sheet of English Summative Test Items

from the Lower Group ... 30 Table 4.4 The Discriminating Power Index of the Upper and Lower Group 33 Table 4.5 The Percentage of Discriminating Power ... 35

(11)

viii

LIST OF APPENDICES

Appendix 1 Form of Test ... 42

Appendix 2 Answer Key ... 48

Appendix 3 Students’ Answer Sheet ... 54

Appendix 4 Surat Pengajuan Judul Skripsi ... 59

Appendix 5 Surat Izin Penelitian ... 61

(12)

CHAPTER I

INTRODUCTION

A. The Background of Study

Making an evaluation is an integral part of life; we evaluate all aspects of our life and work constantly. In the field of education, evaluation plays an important role because it reflects the result of education development. Evaluation may be defined as the systematic process of collecting, analyzing, and interpreting information to determine the extent to which pupils are achieving instructional.

Evaluation gives information about how successful the efforts of education

have been. It helps teachers to get the information about the progress of students’

achievement of the material they have learned in order to make decision. Evaluation cannot be separated from teaching-learning process. According to

Bahman in her book “Fundamental Consideration in Language Testing”, evaluation is defined as the systematic gathering of information for the purpose of making decision.1 In additions, the purpose of evaluation is to provide relevant information.

There are many techniques in collecting information for evaluation purposes. One of them is by using test.2 A test in plain words is “A method of

Lyle F. Bahman, Fundamental Consideration in Language Testing, (Toronto: Oxford University Press, 1990), p.20.

Fred Genesee & John A. Uphsur, Classroom – Based Evaluation in Second Language, (New York: Cambridge University Press, 1996), p. 140

(13)

measuring person’ ability or knowledge in a given domain”.3

Tests are used for pedagogical purposes, either as a mean of motivating students to study or as a mean of reviewing the material taught.

Students usually tend to study harder when they are going to have an examination rather than when they are not and they will emphasize in studying the material that expect to be tested. Thus, if the teacher announces that they are going to give an examination, most students are motivated to study or to review the material assigned.

A test is supposed to be well constructed so that it can be used effectively. To be said as a good test, it has to fulfill the characteristics of good test, they are validity, reliability and practicality. It is valid if the test can measure what is supposed to be measured. It can be reliable if the result from the test is the same even though the test is administered to the same standard for several times. A test can be practical if it is easy to do and administered.

In applying the three characteristics above, the teacher should prepare the test as good as possible. After that, the teacher should administer and score the test; it is desirable to evaluate the effectiveness of the test, especially the test item because it is necessary for teachers to use their own judgment to know how well

the test item works. This is done by studying the students’ responses to each item.

When formalized, the procedure is called item analysis.

Item analysis provides a quick, simple technique for appraising the effectiveness of individual test items.4 Item analysis procedures provide information for evaluating the functional effectiveness of each item and for detecting weaknesses that should be corrected. This information is useful when reviewing the test with students and it is indispensable when building a file of high quality items.

There are three characteristics which are usually determined for a test item: first, item difficulty; it indicates how difficult each item was for the group.

H. Douglass, Teaching by Principles, An Interactive Approach to Language Pedagogy, (New York: Addison Wesley, Longman, 2001), p. 384.

Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Co., 1981), p. 262.

(14)

Second, discriminatory power; it tells how well the item performs in separating the better students from the poorer students. Third, item distracter; for multiple choice items, it indicates how effective each alternative was for the item. So it can be concluded that item analysis provide us the data whether the test item is too difficult or too easy, whether it can discriminate the students or not, and whether all the alternatives functioned as intended.

Incidentally, the writer is as a practice teacher at “SMPN 87” Pondok

Pinang. She often hears some of the students said that English items in their school are hard and some students said in the contrary. Therefore, from the personal experience above, she is interested in figuring out the quality of discriminating power of English summative test items at “SMPN 87” Pondok Pinang. The reason why the writer chooses the discriminating power is because she thinks that the discriminating power deals more with the students than the other two choices-the level on difficulty and the effectiveness of the distracter.

Based on the explanation above, the writer tries to limit the problem of item analysis that she will discuss, so she just focuses on the discriminating power of the test item. The test item that will be analyzed by the writer is a final test of odd semester which is tested on the second grade of “SMPN 87” Pondok Pinang. The main aim of this study is to know how well the test item can discriminate between the students who have achieved well and those who have achieved poorly. So, the writer tries to analyze and interpret it under the title “AN ITEM ANALYSIS ON THE DISCRIMINATING POWER OF ENGLISH SUMMATIVE TEST”. And this study will be done at second grade of “SMPN 87” Pondok Pinang.

B. Limitation and scope of the Problem

In order to make this study easier and deeper to comprehend, and not too broad, the writer is going to limit the area of study. First, the writer intends to see the quality of test item only by doing an item analysis that focused on discriminating power. By analyzing on the discriminating power of test items, the

(15)

writer can conclude how well it discriminates between the students who performed well from those who did poorly on the test as a whole.

C. Formulation of the Problem

Based on the limitation of the problem, the writer formulates the problem in this research as follow: “Do the test items of English summative test which is administered at the second grade of “SMPN 87” Pondok Pinang have a good

discriminating power?”

D. Objective of the Study

The objective of this study is to measure the quality of English summative test and to know whether the English summative test items have a good discriminating power or not. High quality of test items is prominent to diagnose the strengths and weaknesses of students. Thus, the findings of this study are expected to provide useful information about the test items quality.

It is hoped that analysis on discriminating power of English summative test which will be done by the writer can give significant contribution in improving the quality of future English summative test items. It is also expected that it can reveal the students who have achieved well in doing the test and they who have not, so the teacher is able to help the poor students.

E. Method of the Study

This study is categorized as descriptive analysis; because it is intended to

describe the objective condition about the discriminating power of students’ summative test a odd semester of second grade of “SMPN 87” Pondok Pinang.

Besides, this study is called analysis, because it analyze how well the items of English summative test can discriminate between the students who have achieved well and those who have achieved poorly. This study is considered as quantitative research, because the researcher used some numerical data which is analyzed statistically.

(16)

F. Organization of the Writing

The writing systematically divided into five chapters, they are:

Chapter one deals with the introduction. It consists of the background of study, limitation and formulation of the problem, objective of the study, method of the study, significant of the study, and organization of the writing.

Chapter two discusses the theoretical framework. It is divided into two sections. The first section discusses about Test, definition of test, function of test, types of test, types of test item, and character of good test. The second section discusses about item analysis, the definition of item analysis, discriminating power and the importance of item analysis.

Chapter three deals with the research methodology. It discusses about the objective of study, place and time of study, technique of data collecting and technique of data analysis.

In Chapter four, the writer will focus on the research finding. The data description, data analysis and data interpretation are included in this section.

The last chapter, chapter five will talk about the conclusion and suggestion.

(17)

CHAPTER II

THEORETICAL FRAMEWORK

A. Test

1. The Definition of Test

Test is one of instruments for collecting data. Test can be used in an instructional program to assess entry behavior, monitoring learning progress, diagnose learning difficulties, and measure performance at the end of instruction. Tests are given for many different reasons. In order to achieve such diverse purposes, they need to be carefully planned. In classroom settings, this planning usually entails instructional objectives.

A test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.1 A test is an instrument, device, or procedure that proposes a sequence of tasks which a student is to respond – the result of which are used as measures of a specific trait. Test may be defined as a task or series of tasks used to obtain systematic observations presumed to be representative of educational or psychological traits or attributes.2

Cronbach defines a test as a “systematic procedure for observing a person’s behavior and describing it with the aid of a numerical scale or a

Lyle F. Bahman, Statistical Analysis for Language Assessment, (Cambridge: Cambridge University Press, 2004), p. 9.

Gilbert Sax, Principles of Educational and Psychological Measurement and Evaluation, (Belmont: Wadsworth Publishing Company, 1980), p. 13.

(18)

category system”. The phrase “systematic procedure” indicates that a test is

constructed, administered, scored and described according to prescribed rules. The term behavior implies that a test measures the responses a person makes to the test items. Tests do not measure a person directly but rather they infer his characteristics from his responses to test items. We do not observe all behavior but only a sample of behavior. A test contains only a sample of all possible items. The test results are described with the aid of measurement scales.3

Based on the definitions above, the writer can conclude that a test is an instrument which is administered to measure students’ responses to the test items.

2. Types of Test

Test can be categorized according to the types of information they provide. This categorization will prove useful both in deciding whether an existing test is suitable for a particular purpose and in writing appropriate new tests where these are necessary.4 Test can be classified based on its purpose and based on its test maker.

a. Based on its purpose

1) Aptitude Test

An aptitude test is primarily designed to predict success in some future learning activity. It is generally given before the student begins language study, and may be used to select students for a language course or to place students in section appropriate to their ability.

There are some information provided by aptitude test that the test is useful in determining learning readiness, individualizing instruction, organizing classroom groups, identifying underachievers, diagnosing

H.J.X. Fernandes, Testing and Measurement, (Jakarta: National Educational Planning, Evaluation and Curriculum Development, 1984), p.1.

Arthur Hughes, Testing for Language Teachers, (New York: Cambridge University Press, 2003), p.5.

(19)

learning problems and helping students with their educational and vocational plans.

Aptitude tests are often used in selecting individuals for jobs, for admission to training program, for scholarship, and for many other purposes. Sometimes aptitude tests are used for classifying individuals, as when students are assigned to different ability-grouped sections of the same course.5

2) Achievement Test

Achievement tests measure what a person has learnt during a course of instruction. It is given at the end of the course. The content of achievement tests is generally based on the course of syllabus or the course textbook.

Achievement test is designed to indicate degree of success in some past learning activity. This purpose of achievement test is obviously different with the purpose of aptitude test, where the aptitude test is designed to predict success in some future learning activity. A distinction between these two tests is made in terms of the use of the results rather than of the qualities of the test themselves.6

Assessment and evaluation are term often used in connection with achievement testing. The purpose of the testing is for assessing present

attainment. With achievement tests, we are trying to measure students’

present attainment. 7 A common distinction is that achievement tests measure what a student has learned, and aptitude tests measure the ability to learn new tasks.8

Howard B. Lyman, Test Scores and What They Mean, (Singapore: Allyn & Bacon, 1998), p.22.

Drs. Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: Depdikbud, 1998), p.7.

7_{Howard B. Lyman, Test Score……….., p. 22} 8

Robert L. Linn, Norman E. Gronlund, Measurement and Assessment in Teaching, (New Jersey: Prentice-Hall, Inc., 1995), p. 391-392.

(20)

There are two kinds of achievement test, final achievement test and progress achievement test.9 Final achievement test is that administered at the end of the course of study. They may be written and administered by ministries of education, official examining board, or by member of teaching institutions. On the other hand, progress achievement test are intended to measure the progress that students are making. They contribute to formative assessment. Since, progress is toward the achievement of course objective, therefore this test should relate to objective.

3) Proficiency test

Proficiency test are designed to measure people’s ability in a

language regardless of any training they may have had in that language. The content of a proficiency test, therefore, is not based on the content or objectives of language courses that people taking the test may have followed. Rather, it is based on a specification of what candidates have to be able to do in the language in order to be considered proficient. 10 It means that the function of the test is to show whether the candidates have reached certain specific abilities or not.

Some proficiency tests are intended to show whether students have reached a given level of general language ability. Others are designed to show whether students have sufficient ability to be able to use a language in some specific area such as medicine, tourism or academic study or not.

4) Diagnostic Test

The diagnostic tests seek to identify those are in which a student needs further help. These tests can be fairly general, and show, for example, whether a student needs particular help with one of the four

Desmon Allison, Language Testing and Evaluation, (Singapore: Singapore University Press, 1999), p. 80

(21)

main language skills; or they can be more specific, seeking perhaps to

identify weaknesses in a student’s use of grammar. These more specific

diagnostic tests are not easy to design since it is difficult to diagnose precisely strengths and weaknesses in the complexities of language ability. For this reason, there are very few purely diagnostic tests.

b. Based on the Test Maker

1) Standardized Test

Standardized tests are constructed by test specialists working with curriculum experts and teachers. They are standardized in that they have been administered and scored under standard and uniform testing conditions so that results from different classes and different schools may be compared. The quality of the test items in this test is high quality because the test items are made by the specialist. Those test items are also pretested and selected on basis of effectiveness.

2) Teacher-Made Test

Teacher made test are constructed by teachers for use within their own classroom. Their effectiveness depends on the skill of the teacher and his or her knowledge of test construction. The quality of test items in this test is unknown unless the test item file is used. But the quality

is typically lower than standardized test because of teacher’s limited

time and skill. Because this test is conducted within teacher own classroom so that the test only compare the score among the student in that classroom.

In general, teacher made examinations have flexibility for use within a given classroom, but provide little data for comparing students in different classes. Standardized tests, in contrast, are used to compare

students’ performance in different classes or schools.11

Gilbert Sax, Principles of Educational and Psychological Measurement and Evaluation, (Belmont: Wadsworth, Inc., 1980), p. 16-18.

(22)

3. Characteristic of a Good Test

The test can be said as the good test I it has the certain qualifications or the certain characteristics. The most essential characteristic of the good test can be classified into three main aspect, they are, validity, reliability, and practicality.12

a. Validity

The most simplistic definition of validity is that it is degree to which a

test measures what it is supposed to be measure. J.B Heaton said, “The

validity of the test is the extent to which it measures what it is supposed to

measure and nothing else”.13

The validity of a test must be considered in measurement in this case there must be seen whether the test used really measures what are supposed to measure.

b. Reliability

Reliability means dependability or trustworthiness. Basically, reliability is the degree to which a test consistently measures whatever it measures. The more reliable a test is, the more confidence we can have the scores obtained from the administration of the test are essentially the same scores that would be obtained if the test were re-administered to the same group. An unreliable test is essentially useless. If a test were unreliable then scores from a given group would be expected to be quite different every time the test was administered. If an intelligence test were unreliable, a student scoring an IQ of 120 today might score 140 tomorrow, and 95 the day after tomorrow If the test were highly reliable

and if the students’ IQ were 110, then we would not expect that score to

fluctuate too greatly from testing to testing.

Norman E Gronlund, Measurement and…, p.51. 13

(23)

A valid test is always reliable, but a reliable test is not necessarily valid. If a test is measuring what it is supposed to be measuring, it will be reliable and do so every time. But a reliable test can consistently measure the wrong thing and be invalid.14

c. Practicality

The third characteristic of a good test is practicality or usability in the preparation of a new test. The teacher must keep in mind a number of very practical considerations which involves economy, ease administration, and interpretation the result.

Economy means that the test is not costly. The teacher must take into account the cost per copy, how many scores will be needed. How long the administering and scoring of it will take.

Ease administration means that the test administrator can perform his task quickly and efficiently. We must also consider the ease with which the test can be administered.

Ease of interpretation and application JB. Heaton states “The final

point concerns the presentation of the test paper itself”, where possible, it should be printed or type written and appear neat, tidy, aesthetically pleasing. Nothing is worse and more disconcerting to the testiest than

untidy test paper, full of miss spelling, omissions and corrections.” If it

happens, it will be easy for student or testiest easy to interpret the test

items”.15

L.R Gay, Educational Evaluation and Measurement, (New York: Macmillan, Inc., 1985) p. 167.

J. Charles Anderson, Claphane & Dianne Wall, Language Test Construction & Evaluation, (Melbourne: Cambridge University Press, 1995), p.187.

(24)

B. Item Analysis

Selection of appropriate language items is not enough by itself to ensure a good test. Each question needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather simple statistical ways of checking

individuals’ item. This is done by studying the students’ responses to each item. When formalized this procedure is called “item analysis”.16

An item analysis tells us basically three things: how difficult each item is, whether or not the question discriminate or tells the difference between high and low students, and which distracters are working as they should. An analysis like this is used with any important exam- for example, review tests and tests given at the end of a school term or course.

1. The Definition of Item Analysis

Item analysis is usually done for purposes of selecting which items will remain on future revised and improved version of test. There are several descriptions about item analysis. According to Nitko in his book he stated

that, “Item analysis refers to the process of collecting, summarizing, and

using information about individual test items, especially information about

pupil’s responses to items.17 Furthermore, Lado defines item analysis is the study of validity, reliability and difficulty of test items taken individually as if they were separate tests.18

Item analysis usually provides two kinds of information on items: item facility, which helps us decide if test items are at the right level for the target group, and item discrimination, which allows us to see if individual items are

providing information on candidates’ abilities consistent with that provided by

the other items on the test.19

Harold S. Madsen, Techniques in Testing, (New York: Oxford University Press, 1983), p.180.

Anthony J. Nitko, Educational Test and Measurement, an Introduction, (New York: Harcourt B Race Jovanovich, Inc, 1983), p. 284

Robert Lado, Language Testing, (London: Longman Group Limited, 1983), p. 342. 19

Tim McNamara, Language Testing, (Oxford: Oxford University Press, 2000), p. 60

(25)

From those opinions, it can be concluded that item analysis is the

process of collecting information about pupil’s responses to the items, to see

the quality of test items. More specific, item analysis information can tell us if an item was too easy or too hard, how well it discriminated between high and low scores on the test and whether all of the alternatives function as intended. Item analysis data also aids in detecting specific technical flaws and thus further provides information for improving the test items.

2. Discriminating Power

Item discriminatory power of a test is its ability to separate good students from poor students. These students groups are defined by their scores on the test as whole. The difference between the percentage of the top scoring 27% and bottom scoring 27% of students get the item right in its discrimination index.20

As well as knowing how difficult an item is, it is important to know how it discriminates, that is how well it distinguishes between students at different levels of ability. If the item is working well, we should expect more of the top-scoring students to know the answer than the low-scoring ones. If the strongest students get the item wrong, while the weaker students get it right, there is clearly a problem with the item, and it needs investigating.

Each item on the test should contribute to the total score and to the meaning of that total score. Many times the purpose of a test is to discriminate between groups of students, such as those who have mastered the domain of content that the test represents and those students who have not achieved mastery.

The discrimination index can range from -1 to +1. Items with positive values of the discrimination index are desired because those are the items that are contributing to the usefulness of the total score. When the discrimination index is near zero, it indicates that the item is contributing nothing to the

H.J.X. Fernandes, Testing and Measurement, (Jakarta: National Educational Planning, Evaluation and Curriculum Development, 1984), p. 27.

(26)

discriminating power of the overall test.21 When a larger proportion of students in the lower group got the item right more than those in the upper group, it discriminates negatively. And since more students in the upper group than in the lower group got the item right, it is discriminating positively. 22

Item discriminating power can be obtained by subtracting the number of students in the lower group who got the item right (U) from the number of students in the upper group who got the tem right (L) and dividing by the total number of students in one group included in the item analysis (N).

It summarized in formula form, as below: DI = U - L

N Where:

DI = the index of discriminating power

U = the number of pupils in the upper group who answered the item correctly

L = the number of pupils in the lower group who answered the item correctly

N = number of pupils in each of the groups23

The classifications of the index of discriminating power (D) are: DI = 0.70 – 1.00 = Excellent

0.40 – 0.70 = Good 0.20 – 0.40 = Satisfactory

≤ 0.20 = Poor

Negative value on D= Very poor 24

William Wiersma, Educational Measurement and…, p. 245. 22

Gilbert Sax, Principles of Educational and…, p. 191. 23

Charles D. Hopkins & Richard L. Antes, Classroom Measurement and Evaluation, (Illinois: F.E. Peacock Publishers, Inc., 1990), p. 279.

Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. Raja Grafindo Persada, 2006), p. 389.

(27)

3. Types of Test Item

In constructing the test items, the test maker may choose from a variety of item types. There are two types of test items: subjective test and objective test.

a. Subjective test

Subjective test is attest which the examinee answers in his own words, and at appropriate length, all or some of a relatively small number of questions. Typical key-words in the question set in examinations of this

kind are: „discuss’, „compare’, „contrast’, „describe’, and the answer they

elicit may range from a single sentence to a dozen or more paragraphs.

These answers are commonly called „essays’. Here are some subjectively marked tests:

1) Short-Answer Items

The short answer (or completion) item is the only objective item type that requires the examinee to supply, rather than select, the answer. Its make-up is similar to a well-stated multiple choice item without the alternatives. Thus, it consists of a question or incomplete statement, to which the examinee responds by providing the appropriate words, numbers, or symbols.

2) Essay

Essay tests are inefficient for measuring knowledge outcomes but they provide a freedom of response that is needed in measuring certain complex outcomes. These outcomes include the ability to create, to organize, to integrate, to express and similar behaviors that call for the production and synthesis of ideas.

The most noticeable characteristic of the essay test is the freedom of response it provides. The student is asked a question that requires him to produce his own answer. He is relatively free to decide how to

(28)

approach the problem, what factual information to use, how to organize his reply, and what degree of emphasis to give each aspect of his answer. Thus the essay question places a premium on the ability to produce, integrate, and express idea.

b. Objective test

Objective test is said to be one that may be scored by comparing examinee responses with an establishing set of acceptable responses of scoring key. Objective test can be scored objectively. That is, equally competent scorers can score them independently and obtain the same results. Objective test includes a variety of item types:

1) Multiple choice items

Multiple-choice refers to test items that require the students to select one or more responses from a set of two or more options These items consists of a stem, which presents a problem situation, and several alternatives, which provide possible solutions to the problem. The stem may be a question of an incomplete statement. The alternatives include the correct answer and several plausible wrong answers, called distracters. The function of the latter is to distract those students who uncertain of the answer.25

Multiple choice items can measure a variety of learning outcomes, ranging from simple to complex, and it is easy to see why this item type is regarded so highly and used so widely.

2) Matching Items

Another selected-response item, sometimes called an objective item, is the matching item. The format is not used as extensively as true-false or multiple-choice items. But the matching item can be used

(29)

effectively to measure learning and, when used, it provides variety in the test format for both student and teacher.

The matching item is exactly what the name implies; it requires the student to use some association criterion in order to match the words or phrases that represent ideas, concepts, principles, or things. Matching items are usually presented in two-column format: one column consists of premises and the other consists of responses.26

3) True-False Items

The true-false item is simply a declarative statement that the student must judge as true or false. There are modifications of this

basic form in which the student must respond “yes” or “no,” “agree” or “disagree,” “right” or “wrong,” “fact” or “opinion,” and the like.

Such variations are usually given the more general name of alternative-response items. In any event this item type is characterized by the fact that only two responses are possible.27

True-false items can be effective when a few guidelines are followed in the construction: Statements must be clearly true or false, statements should not be lifted directly from the text, specific determiners should be avoided, trick questions should not be used, some statements should be written at higher cognitive levels, and true- false items should be of the same frequency and length.28

3. The Importance of Item Analysis

Item analysis is an important and necessary step in the preparation of good multiple-choice tests. Because of this fact, it is suggested that every classroom teachers who use multiple choice test data should know something of item analysis, how it is done and what it means.

William Wiersma, Educational Measurement and Testing, (Boston: Allyn & Bacon, 1990), p. 48.

Norman E. Gronlund, Constructing Achievement…, p.54. 28

(30)

The benefits of item analysis are not limited to the improvement of individual test items; however there are a number of fringe benefits of special value to classroom teachers. The most important of these are the following: a. Item analysis data provide a basis for efficient class discussion of the test

result.

b. Item analysis data provide a basis for remedial work.

c. Item analysis data provide a basis for the general improvement of classroom instruction.

d. Item analysis procedures provide a basis for increased skill in test construction.29

While Nitko states in his book, the important of item analysis are: a. Determining whether an item functions as teacher intends,

b. Feedback to students’ performance and as a basis for class discussion, c. Feedback to the teacher about pupil’s difficulties,

d. Area for curriculum improvement, e. Revising the items,

f. Improving item writing skills.30

Robert L. Linn and Norman E. Grondlund, Measurement and…, p.316. 30

(31)

CHAPTER III

RESEARCH METHODOLOGY

A. Place and Time of Research

The research was conducted at “SMPN 87” Pondok Pinang. This is located at Jl. _{Ciputat Raya Pondok Pinang, Kebayoran Lama, South of Jakarta.}_{The writer} did the research in December 2010. The writer took the English summative test

papers and the students’ answer sheets of second grade period of 2010-2011 to be analyzed.

B. Technique of Sample Taking

The writer took the sample from third year students of “SMPN 87”

Pondok Pinang. The total number of second year students is 238 students; those are divided into 6 classes. The writer took 25% of the total number of the second year students as a sample. That is 25% x 238 = 60 students. The writer used an ordinal sampling to get the students’ answer sheet. The writer divides the students into three groups; they are upper, middle, and lower group. Then the writer takes upper and lower group only to be analyzed.

(32)

C. Technique of Data Collecting

To collect data connecting with the topic of discussion, the writer came to the school to get the permit from the headmaster to take students’ answer sheet and the test question paper of English summative test of second year students of

“SMPN 87” Pondok Pinang to be analyzed.

D. Research Instrument

a. Students’ answer sheet

The students answer sheet is papers in which the students give their answer that correspond to the English summative test. The English summative test that the writer used is the final odd semester for the second year students of

“SMPN 87” Pondok Pinang academic year 2010-2011, prepared by MGMP. b. English summative test of the second year student of “SMPN 87” Pondok

Pinang.

E. Technique of Data Analysis

In this research, the writer used quantitative method to analyze the discriminating power of English summative test items of second year of “SMPN

87” by using a statistic formula, namely, the Discriminating Power Index:

DI = U - L N Where:

DI = the index of discriminating power

U = the number of pupils in the upper group who answered the item correctly L = the number of pupils in the lower group who answered the item correctly N = number of pupils in each of the groups1

Charles D. Hopkins & Richard L. Antes, Classroom Measurement and Evaluation, (Itasca: F.E. Peacock Publishers, Inc., 1990), p. 279

(33)

The classifications of the index of discriminating power (D) are: DI = 0.70 – 1.00 = Excellent

0.40 – 0.70 = Good 0.20 – 0.40 = Satisfactory

≤ 0.20 = Poor

Negative value on D = Very poor 2

(34)

CHAPTER IV

RESEARCH FINDINGS

A. Description of Data

The data which is used by the writer is the English summative test in the

odd semester of second grade of “SMPN 87” Pondok Pinang. This English

summative test was held on Wednesday, December 8th 2010, that must be finished in 120 minutes. The total numbers of test items are 50 questions, which all of them are multiple choice items.

The total numbers of students that took part in this analysis are 60 students. Kelley in the book “Classroom Measurement and Evaluation” demonstrated that the selection of criterion groups based upon the upper 27 percent and lower 27 percent of the papers provide the greatest confidence that the upper group is superior in the trait measured by the test as compared to the lower group. The middle 46 percent of the papers is not used when 27 percent in the upper and 27 percent in the lower groups are employed in item analysis.1 Based on that statement, the writer classified the students into three groups; upper, middle and lower group. The writer took only 27% of the lower group and 27% of the upper group for this analysis. And the rest students that belong to the middle

Charles D. Hopkins, Richard L. Antes, Classroom Measurement and Evaluation,(Itasca: F.E. Peacock Publishers, Inc., 1990), p. 275

(35)

group will not take part to this analysis. The next table is the students’ scores and

group position in English summative test.

Table 4.1

The Students’ Scores and Group Position of English Summative Test

In the Odd Semester

No Score Explanation

1 82

U

P

E

R

G

R

O

U

P

2 80

3 78

4 76

5 76

6 74

7 74

8 74

9 74

10 72

11 72

12 72

13 72

14 64

15 64

16 64

17 64

M

(36)

19 62

I

D

L

E

G

R

O

U

P

20 60

21 60

22 60

23 60

24 60

25 58

26 58

27 58

28 58

29 58

30 58

31 58

32 56

33 56

34 56

35 56

36 56

37 54

38 54

39 54

40 54

41 54

42 54

(37)

44 52

45 52

L

O

W

E

R

G

R

O

U

P

46 50

47 50

48 50

49 48

50 46

51 46

52 44

53 44

54 42

55 40

56 38

57 34

58 34

59 34

60 32

Table 4.1 shows that students who are taking the test are classified into 3 groups: upper group, middle group and lower group. The writer took 27% or 16 students from upper and lower group to be analyzed. The highest score in upper group is gained by one student in score 82. The lowest score in upper group is gained by three students in the same score 64. Meanwhile the highest score in lower group is gained by one student in the same score 52. So, the lowest score in lower group is gained by one student in score 32.

(38)

(39)

Table 4.2

The Students' Answer Sheet of English Summative Test Items from the Upper Group

Students' Number of items

score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Answer key * B D C D B B D B B B A C C B C D C C B C D A B D

1 82 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1

2 80 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1

3 78 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1

4 76 0 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 1 0 1

5 76 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1

6 74 0 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1

7 74 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1

8 74 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1

9 74 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1

10 72 0 1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 1 0 0 1 0 0 1 1 1

11 72 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0

12 72 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0

13 72 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1

14 64 0 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 1

15 64 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1

16 64 0 1 1 0 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1

(40)

Answer key A D A B B D B D A B C D C C D B C B B D B B D A D

1 82 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1

2 80 0 1 1 0 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0

3 78 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1

4 76 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1

5 76 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1

6 74 0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0

7 74 0 0 1 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 0 1 1 0 1 1 0

8 74 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1

9 74 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1

10 72 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0

11 72 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1

12 72 1 1 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0

13 72 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1

14 64 0 1 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1

15 64 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1

16 64 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1

(41)

(42)

The Students' Answer Sheet of English Summative Test Items from the Lower Group

Students' Number of Items

score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Answer key * B D C D B B D B B B A C C B C D C C B C D A B D

1 52 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 1 1 1

2 50 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0

3 50 0 1 1 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1

4 50 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0

5 48 0 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1

6 46 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 0 1 0 0 1 1 0

7 46 0 0 1 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0

8 44 0 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1

9 44 0 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0

10 42 0 0 1 1 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 0 1 0 0

11 40 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1

12 38 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1

13 34 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0

14 34 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 0

15 34 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1

16 32 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 0 0

(43)

Students Number of Items

score 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Answer key A D A B B D B D A B C D C C D B C B B D B B D A D

1 52 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1

2 50 1 1 1 1 1 0 0 0 1 0 1 1 0 1 1 0 1 0 0 0 0 1 1 0 0

3 50 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0

4 50 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 1

5 48 1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1

6 46 1 1 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0

7 46 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 0 0 0 1

8 44 1 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 1 0 0

9 44 0 1 0 1 1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0

10 42 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0

11 40 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0

12 38 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0

13 34 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 1 1 1

14 34 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1

15 34 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1

16 32 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0

(44)

From the table 4.3 above it can be concluded that the responses of each item of lower group students in their English test are:

1. There are 16 students who answered the item no 3 and 42 correctly. 2. There are 13 students who answered the item no 23 correctly.

3. There are 12 students who answered the item no 24 and 45 correctly. 4. There are 11 students who answered the item no 4 and 9 correctly. 5. There are 10 students who answered the item no 13, 14 and 26 correctly. 6. There are 9 students who answered the item no 10, 18, 20, 33, and 49

correctly.

7. There are 8 students who answered the item no 28, 36, and 40 correctly. 8. There are 7 students who answered the item no 2, 11, 25, 27, 29, 38, 43,

47, 48, and 50 correctly.

9. There are 6 students who answered the item no 6, 8, 19, 34, and 37 correctly.

10.There are 5 students who answered the item no 5 correctly.

11.There are 4 students who answered the item no 15, 30, 31 and 46 correctly. 12.There are 2 students who answered the item no 7, 16, 32, 35, 39, 41 and 44

correctly.

13.There is 1 student who answered the item no 12 and 17 correctly. 14.There is no student who answered the item no 1, 21 and 22 correctly.

(45)

Before the writer analyzes the data, the writer has calculated the data into statistic calculation. The writer used Discrimination Index formula to find the discriminating power criteria of English summative test. The table is as follows:

Table 4.4

The Discriminating Power Index of the Upper and Lower Group

Item number

Total correct answer

U - L DI = U - L

N Remark*

Upper Group

Lower Group

1 0 0 0 0 Poor

2 16 7 9 0.56 Good

3 16 16 0 0 Poor

4 12 11 1 0.06 Poor

5 15 5 10 0.62 Good

6 13 6 7 0.43 Good

7 11 2 9 0.56 Good

8 12 6 6 0.37 Satisfactory

9 12 11 1 0.06 Poor

10 15 9 6 0.37 Satisfactory

11 13 7 6 0.37 Satisfactory

12 1 1 0 0 Poor

13 14 10 4 0.25 Satisfactory

14 11 10 1 0.06 Poor

15 15 4 11 0.68 Good

16 11 2 9 0.56 Good

17 14 1 13 0.81 Excellent

18 10 9 1 0.06 Poor

19 10 6 4 0.25 Satisfactory

20 10 9 1 0.06 Poor

21 5 0 5 0.31 Satisfactory

22 3 0 3 0.18 Poor

(46)

24 7 12 -5 -0.31 Very poor

25 14 7 7 0.43 Good

26 7 10 -3 -0.18 Very poor

27 9 7 2 0.13 Poor

28 14 8 6 0.37 Satisfactory

29 12 7 5 0.31 Satisfactory

30 13 4 9 0.56 Good

31 8 4 4 0.25 Satisfactory

32 1 2 -1 -0.06 Very Poor

33 12 9 3 0.18 Poor

34 13 6 7 0.43 Good

35 14 2 12 0.75 Excellent

36 16 8 8 0.50 Good

37 11 6 5 0.31 Satisfactory

38 11 7 4 0.25 Satisfactory

39 10 2 8 0.50 Good

40 16 8 8 0.50 Good

41 12 2 10 0.62 Good

42 16 16 0 0 Poor

43 15 7 8 0.50 Good

44 7 2 5 0.31 Satisfactory

45 15 12 3 0.18 Poor

46 13 4 9 0.56 Good

47 12 7 5 0.31 Satisfactory

48 15 7 8 0.50 Good

49 14 9 5 0.31 Satisfactory

50 11 7 4 0.25 Satisfactory

*note: the classification of remark is adopted by Anas Sudijono, Pengantar Evaluasi Pendidikan, (Jakarta: PT. Raja Grafindo Persada, 2006), p. 389.

(47)

Based on the data above, the percentage of discriminating power of English summative test is:

Table 4.5

The percentage of Discriminating Power No Discriminating

power Total item Percentage Item number

1 Excellent 2 4% 17, 35

2 Good 16 32%

2, 5, 6, 7, 15, 16, 25, 30, 34, 36, 39, 40, 41, 43, 46, 48

3 Satisfactory 15 30%

8, 10, 11, 13, 19, 21, 28, 29, 31, 37, 38, 44, 47, 49, 50

4 Poor 14 28%

1, 3, 4, 9, 12, 14, 18, 20, 22, 23, 27, 33, 42, 45

5 Very Poor 3 6% 24, 26, 32

The table above showed that: there are 2 test items (4%) are categorized into excellent test item, which is showed by the test items number 17 and 35. It is categorized as excellent test item because its discriminating index is in range between 0.70 – 1.00. There are 16 test items (32%) are categorized into good items, that range from 0.40 – 0.69, they are test items number 2, 5, 6, 7, 15, 16, 25, 30, 34, 36, 39, 40, 41, 43, 46, and 48. There are 15 test items (30%) are categorized as satisfactory test items for their discriminating index are in range 0.20 – 0.39, they are test items number 8, 10, 11, 13, 19, 21, 28, 29, 31, 37, 38, 44, 47, 49, and 50.

Meanwhile, 14 test items (28%) are categorized into poor test items because their discriminating index are range in 0.00 – 0.19, they are test items number1, 3, 4, 9, 12, 14, 18, 20, 22, 23, 27, 33, 42, and 45. At last, there are 3 test

(48)

items (6%) are categorized as very poor item as their discriminating index are range in negative values.

B. Data Analysis

In analyzing the discriminating power of the data, the writer listed the

students’ responses of each number of the test firstly. The list can be seen in the

table 4.2 and 4.3 of this “skripsi”.

Then the next step is to make a format of item analysis. This format and the result of this format labeled table 4.4. The last step is to count discriminating power of all items using this formula:

DI = U - L

Where:

DI = the index of discriminating power

U = the number of pupils in the lower group who answered the item correctly L = the number of pupils in the lower group who answered the item correctly N = number of pupils in each of the groups

The result of this last step can be seen also in the table 4.4 In this table, result of each item will be in decimal then the writer categorized each item according to this formula:

The classifications of the index of discriminating power (D) are: DI= 0.70 – 1.00 = Excellent

0.40 – 0.70 = Good

0.20 – 0.40 = Satisfactory

≤ 0.20 = Poor

Negative value on D = Very poor

Based on the data of item analysis result in discriminating power above, the writer can conclude that from 50 items:

1. There are 33 test items (66%) are categorized into good test items which is range from 0.25 – 0.81.

(49)

2. There are 14 test items (28%) are categorized into poor test items because their discriminating index are range in 0.00 – 0.18.

3. There are 3 test items (6%) are categorized as very poor item as their discriminating index are range in negative values that -0.06 – -0.31

C. Data Interpretation

For whole items, the writer can interpret that the discriminating power of

English summative test prepared by “MGMP” tested at the second grade of “SMPN 87” Pondok Pinang belongs to good discriminating power, because there are 33 test items or 66% from 50 test items is ranging from 0.25 – 0.81.

(50)

Based on the table 4.2 on the previous page, the writer concluded the achievement of upper group students in their English test. From 50 multiple choice items, none of the students got the perfect score. The following description tells about the responses of each item:

1. There are 16 students who answered the item no 2, 3, 23, 36, 40, and 42 correctly.

2. There are 15 students who answered the item no 5, 10, 15, 43, 45, and 48 correctly.

3. There are 14 students who answered the item no 13, 17, 28, 35, and 49 correctly.

4. There are 13 students who answered the item no 6, 11, 30, 34, and 46 correctly.

5. There are 12 students who answered the item no 4, 8, 9, 29, 33, 41, and 47 correctly.

6. There are 11 students who answered the item no 7, 14, 16, 37, 38, and 50 correctly.

7. There are 10 students who answered the item no 18, 19, 20, and 39 correctly.

8. There are 9 students who answered the item no 27 correctly. 9. There are 8 students who answered the item no 31 correctly.

10.There are 7 students who answered the item no 24, 26, and 44 correctly. 11.There are 5 students who answered the item no 21 correctly.

12.There are 4 students who answered the item no 25 correctly. 13.There are 3 students who answered the item no 22 correctly. 14.There is 1 student who answered the item no 12 and 32 correctly. 15.There is no student who answered the item no 1 correctly.

(51)

CHAPTER V

CONCLUSION AND SUGGESTION

A. Conclusion

Based on the analysis and the interpretation in the previous chapter, the writer would like to conclude that the English summative test which is tested at

second grade of “SMPN 87” Pondok Pinang, can be categorized into 5 different range of discrimination power index. First, they are 2 test items (4%) that is categorized into excellent test items. Then, there are 16 test items (32%) that are categorized into good test items. Besides that, 15 test items (30%) are categorized into satisfactory test items. Fourth, there are 14 test items (28%) are categorized as poor test items. Lastly, 3 test items (6%) are categorized into very poor test items.

So, there are 33 test items (66%) of English summative test regarded as a good discriminating power that range from 0.25 – 0.81 and it can be used for the next test. Meanwhile, 14 test items (28%) are needed to be revised for their poor value in differentiating the ability of the upper from the lower group that range from 0.00 – 0.18. And 3 test items (6%) have to be eliminated, because those items have negative discrimination index that range from -0.06 - -0.31.

From the explanation above, the writer concludes that the English summative test which is tested at second grade of “SMPN 87” Pondok Pinang has good discriminating power, because 33 items (66%) of the test items have fulfilled the criteria of a positive discriminating power which range from 0.25 – 0.81.

(52)

B. Suggestion

After doing the research, there are some suggestions that can be given

in relation to the writer’s conclusion. The suggestions are as follows:

1. Teachers have to give good techniques in answering the items. For instance, encourage the students to do the easier items and not to be stuck to the difficult items. This technique should be common used by the students so that they will not waste their time.

2. Teachers should save test items which have satisfactory, good and excellent criteria in order can be used by the teachers for the future evaluation

3. Teachers should revise the test items which have poor criteria and discard those which have very poor criteria, so that they can be used for the next evaluation.

(53)

BIBLIOGRAPHY

Anderson, J. Charles, et.al. Language Test Construction & Evaluation, Melbourne: Cambridge University Press, 1995.

Bahman, Lyle F. Statistical Analysis for Language Assessment, Cambridge: Cambridge University Press, 2004.

Brown, H. Douglas. Teaching by Principles, An Interactive Approach to Language Pedagogy, White Plains: Addison Wesley, Longman, 2001. Fernandes, H.J.X. Testing and Measurement, Jakarta: National Educational

Planning, Evaluation and Curriculum Development, 1984.

Gay, L.R. Educational Evaluation and Measurement, New York: Macmillan, Inc., 1985.

Gronlund, Norman E. Measurement and Evaluation in Teaching, New York: Macmillan Publishing Co., Inc, 1981.

Genesee, Freed and John A.Upshur. Classroom – Based Evaluation in Second Language, New York: Cambridge University Press, 1996.

Heaton, J.B. Writing Language Test, Longman: 1998.

Hopkins, Charles D. and Antes, Richard L. Classroom Measurement and Evaluation, Itasca: F.E. Peacock Publishers, Inc., 1990.

Hughes, Arthur. Testing for Language Teachers, Cambridge: Cambridge University Press, 2003.

Lado, Robert. Language Testing, London: Longman Group Limited, 1983.

Madsen, Harold S. Techniques in Testing, New York: Oxford University Press, 1983.

McNamara, Tim. Language Testing, Oxford: Oxford University Press, 2000. Nitko, Anthony J. Educational Test and Measurement, an Introduction, New

York: Harcourt B Race Jovanovich, Inc, 1983.

Sax, Gilbert. Principles of Educational and Psychological Measurement and Evaluation, Belmont: University of Washington, 1980.

Sudijono, Anas. Pengantar Evaluasi Pendidikan, Jakarta: PT. Raja Grafindo Persada, 2006.

(54)

Wiersma, William. Educational Measurement and Testing, Boston: Allyn & Bacon, 1990.

(1)

2. There are 14 test items (28%) are categorized into poor test items because their discriminating index are range in 0.00 – 0.18.

3. There are 3 test items (6%) are categorized as very poor item as their discriminating index are range in negative values that -0.06 – -0.31

C. Data Interpretation

For whole items, the writer can interpret that the discriminating power of English summative test prepared by “MGMP” tested at the second grade of “SMPN 87” Pondok Pinang belongs to good discriminating power, because there are 33 test items or 66% from 50 test items is ranging from 0.25 – 0.81.