The difficulty index of english summative test in SMPN 1 Ciputat

THE DIFFICULTY INDEX OF ENGLISH SUMMATIVE
TEST IN SMP NEGERI 1 CIPUTAT
A “Skripsi”
Presented to the Faculty of Tarbiyah and Teachers’ Training
in Partial Fulfillment of the Requirements for
the Degree of Strata 1 (S1)

By:
ENGKAR KARWATI
NIM: 105014000374

English Education Department
The Faculty of Tarbiyah and Teachers’ Training
The State Islamic University (UIN)
Syarif Hidayatullah Jakarta
2010

ABSTRACT
Karwati, Engkar, 2009, The Difficulty Index of English Summative Test in SMP
Negeri 1 Ciputat, A skripsi, English Education Departement, The Faculty
of Tarbiyah and Teachers’ Training, The State Islamic University Syarif

Hidayatullah Jakarta.
Advisor: Prof. Dr. Arief Furchan, MA.
Key Words: Difficulty Index, English Summative test, SMP Negeri 1 Ciputat
One of the ways of evaluation is by giving a test to measure the succesfull
of teaching and learning process. In order a test really can measure testees’ability
accuratly, of course we also need good evaluation tool or a good test to know its
result. One of criterias of a good test is a test which has a good index difficulty.
The test can be categorized as the test which has a good index difficulty if the test
has a balance index proportion among the three index difficulty such as easy,
moderate, and difficult items. This study is aimed to analyze balance proportion
among the three of index difficulty such as easy, moderate, and difficult items in
English summative test for the second grade in SMP Negeri 1 Ciputat. The result
of the study is to help the English teacher as the test maker in to revise the items
in order the next test can be better than before, then the test also can be functioned
well as the accurate evaluation tool.
In this study the writer uses quantitative method where the researcher
analizes the statistic data. The data that is used in this study is the students’
answer sheet of English summative test of the second grade in SMP Negeri 1
Ciputat which is taken from the three classes which is covered 116 students. This
data is taken randomly from the ten classes. Besides that, the test script, the key

answer, and the profile of school is also used as the data in this study.
The result of this study shows that the proportion among the three index
difficulty such as easy, moderate, and difficult index is not balance. It can be seen
that 60% or 15 items of 25 items belong to easy category, 32% or 8 items belong
to moderate category, and 8% or 2 items belong to difficult category.
Based on this research study it can be suggested that there will be a
revision on the item numbers of English Summative Test in SMPN 1 Ciputat. In
making the test item it has to be considered the balance of the total test items for
each index difficulty such as easy,difficult, and moderate.

ABSTRAK
Karwati, Engkar, 2009, The Difficulty Index of English Summative Test in SMP
Negeri 1 Ciputat, Skripsi, Jurusan Ilmu Pendidikan Bahasa Inggris,
Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif
Hidayatullah Jakarta.
Pembimbing: Prof. Dr. Arief Furchan, MA.
Kata Kunci: Difficulty Index, English Summative test, SMP Negeri 1 Ciputat
Test merupakan salah satu alat evaluasi untuk mengukur keberhasilan
proses belajar mengajar. Supaya test itu betul-betul bisa berfungsi sebagai alat
ukur yang baik maka diperlukan suatu test atau alat ukur yang baik pula yang bisa

mengukur testee dengan akurat. Salah satu kriteria dari test yang baik yaitu
apabila test tersebut mempunyai derajat kesukaran yang baik. Test dikatakan
mempunyai derajat kesukaran yang baik apabila jumlah proporsi soal untuk
masing-masing kategori soal mudah, sedang dan sukar seimbang. Penelitian ini
bertujuan mengetahui keseimbangan jumlah proporsi soal untuk masing-masing
kategori soal mudah, sedang dan sukar pada soal tes sumatif mata pelajaran
bahasa Inggris kelas 2 SMP Negeri 1 Ciputat. Hasil penelitian ini diharapkan
dapat membantu guru bahasa Inggris yang dalam hal ini sebagai pembuat soal
untuk dapat memperbaiki butir- butir soal sehingga test yang akan datang bisa
lebih baik, benar- benar bisa berfungsi sebagai alat ukur yang akurat.
Penelitian ini menggunakan metode kuantitatif dimana peneliti
menggunakan data berupa angka yang dihitung secara statistik. Data yang
digunakan dalam penelitian ini adalah lembar jawaban test sumatif siswa kelas 2
SMP Negeri 1 Ciputat yang diambil dari 3 kelas yang berjumlah 116 siswa. Data
ini diambil secara acak dari 10 kelas. Selain itu, lembar soal, kunci jawaban, dan
profil SMP Negeri 1 Ciputat digunakan juga sebagai data dalam penelitian ini.
Hasil dari penelitian ini menunjukan bahwa jumlah proporsi soal untuk
masing- masing kategori soal mudah, sedang dan sukar tidak seimbang. Hal ini
terbukti dari 60% atau sebanyak 15 butir soal dari 25 butir soal tes sumatif
termasuk kategori soal mudah, 32% atau sebanyak 8 butir soal termasuk kategori

soal sedang dan 8% atau 2 soal termasuk kategori soal sukar.
Berdasarkan hasil penelitian ini, dapat disarankan agar dilakukan
perbaikan terhadap butir-butir soal test sumatif kelas 2 SMP Negeri 1 Ciputat.
Dalam pembuatan soal perlu diperhatikan keseimbangan jumlah soal untuk
masing-masing kategori soal mudah, sedang dan sukar.

ACKNOWLEDGMENT
In the name of Allah, the Beneficent, the Merciful. All prises be to Allah
because by “qudrat” and “iradat” Allah the writer can finish this skripsi. Peace
and blessing be upon our final prophet in the world Muhammad saw. His family,
his companion, and his adherence.
The writer extraordinarily fortunate in having a lovely family, so that in
this occasion the writer would like to express her greatest appreciation, honor,
gratitude to her beloved family: mother and father (Iwis Alwisah and Eno
Sukarna), brothers and sisters (Suhendi, S.Ag., E. Sukatma, Ikah Atikah and Siti
Rohmah Rohimah S. Pd.), she thanks to them for their love,encouragement,
guidance, pray, and facility to motivate the writer to finish her study. The special
thanks goes to ‘teteh’ Nung Siti Nurjanah and ‘teteh’ Eno Atik Riyanah, ‘Kk’
Humaira Puspita Puteriutami, And ‘Dd’ Abdul Basith Basthotaen for their
patience and kindnes of her husband and father in low in supporting the writer..

Her sincere thanks and deep appreciations also go to all lecturers staff in
English Education Departement who have very kind and patience in dedicating
their time, and giving precious knowledge, invaluable guidance, advice during the
writer studies in this university.
The writer would like to express her highest appreciation and gratitude to
my advisor, Prof. Dr. Arief Furchan, MA for his advice, encouragement, and
patience during writing this paper.
Thank to the Principal of SMP Negeri 1 Ciputat for giving chance, help to
conduct the research.
Moreover the writer would like to thank all of her friends who have been
very supportive..
Finally, the writer would like to thank to every body who has important to
completion of her paper, as well as expressing her apology that she could not
mention personally one by one.

Ciputat, 17 September 2009
The Writer

TABLE OF CONTENTS


ABSTRACT ...................................................................................................i
ABSTRAK .....................................................................................................ii
ACKNOWLEGMENT ................................................................................iii
TABLE OF CONTENTS ............................................................................. iv
LIST OF TABLES ...................................................................................... vii
LIST OF APPENDIK ................................................................................ viii

CHAPTER I: INTRODUCTION
A. Background of the Study ..................................................................... 1
B. Research Question ............................................................................... 4
C. The Scope of the Study ........................................................................ 4
D. The Purpose of the Study ..................................................................... 5
E. The Significant of the Study ................................................................ 5
F. The Organization of the Study ............................................................. 5
G. The Review of Related Literature ......................................................... 6

CHAPTER II: THEORITICAL FRAMEWORK
A. Introduction ......................................................................................... 8
B. Evaluation ........................................................................................... 9
C. Test ................................................................................................... 10

D. Types of Test ..................................................................................... 11
1. Achievement Test ........................................................................
2. Proficiency Test ...........................................................................
3. Diagnostic Test ............................................................................
4. Aptitude Test ...............................................................................
E. Types of Test Item ............................................................................. 15
1. Multiple- Choice Test .................................................................. 15
2. Essay Test ................................................................................... 16

F. The Characteristic of good Test ......................................................... 16
1. Validity ....................................................................................... 16
a. Face Validity ......................................................................... 17
b. Content Validity .................................................................... 17
c. Construct Validity .................................................................. 17
d. Empirical Validity ................................................................. 17
2. Reliability .................................................................................... 18
3. Practicality .................................................................................. 18
G. Technique to Measure the Good Test ................................................. 19
1. Item Analysis ................................................................................ 19
H. Kinds of Items Analysis ................................................................... 21

1. Index of Difficulty ....................................................................... 21
2. Discriminataing Power ................................................................ 26
3. The Effectiveness of Distracters ................................................... 27

CHAPTER III: THE PROFILE OF SMP NEGERI I CIPUTAT
A. School Profile .................................................................................... 29
B. General Information about SMP N I Ciputat ...................................... 30
C. Students General Information ............................................................ 30
D. The General Information of Teachers and Administration Employee . 31
E. The Headmaster of SMP Negeri I Ciputat from Era to Era ................ 31
F. The Building Facility ......................................................................... 31
G. The Vission and Mission of SMP Negeri I Ciputat ............................ 32
H. The Purpose of SMP Negeri I Ciputat ................................................ 33

CHAPTER IV: RESEARCH METHODOLOGY AND FINDINGS
A. Research Methodology ...................................................................... 35
1. The Data ...................................................................................... 35
2. The Source .................................................................................. 37
B. Research Findings ............................................................................. 38


1. Presentation of the Data ...............................................................
2. Analysis of the Data ....................................................................
3. The Interpretation of the Data ......................................................

CHAPTER V: CONCUSION AND SUGGESTIONS
A. Conclusion ........................................................................................
B. Suggestions .......................................................................................

BIBLIOGRAPHY

APPENDIX

LIST OF TABLES
TABLE 1

: The different of the previous research index difficulty of test item
and the writer index difficulty research

TABLE 2


: The differences of the test types

TABLE 3

: Samples of the multiple choice items result

TABLE 4

: The differences of expert opinion in interpreting the difficulty
index

TABLE 5

: The Students general information

TABLE 6

: The general information of teachers and administration employee

TABLE 7


: The headmaster of SMPN 1 Ciputat from era to era

TABLE 8

: The English summative test items classification

TABLE 9

: The upper group position of the English summative test result

TABLE 10

: The lower group position of the English summative test result

TABLE 11

: The statistical calculation of the balance degree of index difficulty

LIST OF APPENDIXES

APPENDIX 1 Students’ answer sheet of English summative test in SMP N 1
Ciputat

APPENDIX 2 The upper group position of the English summative test result in
SMP N 1 Ciputat

APPENDIX 3 The lower group position of the English summative test result in
SMP N 1 Ciputat

APPENDIX 4 The index difficulty of the upper group of summative test in
SMP N 1 Ciputat

APPENDIX 5 The index difficulty of the upper group of summative test in
SMP N 1 Ciputat

APPENDIX 6 The result of index difficulty calculation of English summative test
in SMP N 1 Ciputat

APPENDIX 7 The statistical calculation of the balance degree of index difficulty

APPENDIX 8 Letters

CHAPTER 1
INTRODUCTION
A.

Background of the study
English is one of the International languages in the world which has a big

role in this modern life either in education, politic, social, and economic. Because
the awareness of the government about the importance of English so English is
included in the national curriculum of education especially in Indonesia. Based on
those polices then English become one of important subject that must be learnt by
all of students start from Elementary school (SD), Junior and Senior High School
(SMP/ SMA) until the level of university whether it is in state or private school.
To find out whether the learning process of English in Elementary, Junior,
and Senior high school till university is success or not so it needs evaluation to
control it. One of the methods of evaluation is by giving a test to the students.
Through a test the teacher will find out the information about the target of
learning process has been achieved or not.
In order a test can provide the accurate information so the test or the tools of
evaluation which is really can measure the testee which the accurate result. To
convince whether the test really accurate or not so it needs to do item analysis on
the items of the test. There are three method of item analysis. First, the analysis on
the index difficulty which means to measure the degree of difficulty whether the
item is too easy, moderate, or too difficult. Second, discriminating power of the
item analysis means to measure whether the items can differentiate between lower
group and upper group. Third, the effectiveness of distracter is the analysis to
know whether the answer options function well or not. These three ways can be
used as the barometer to know the test is good or not. These three ways can be
used to measure whether the items of the test is good or not.
Based on the explanation above it is known that to make a good test is not
really easy. In another sides most of teachers always try to make a good test by
arranging a good planning to evaluate their student’s ability in order the test can

be an accurate evaluation. The effort of making the good evaluation for the
students is also done by the teachers in SMPN 1 Ciputat especially by the English
teachers. One of the real efforts is the efforts of the English teachers in making a
good planning (making kisi- kisi before) to make an English summative test.
Since the English summative test in SMPN 1 Ciputat is arranged by the
teachers so this case becomes very interesting to be researched because usually
the teacher made test usually do not have a good test items. So the writer is very
interested to know whether the test items which is made based on a good planning
(making kisi-kisi before) will result a good test items also. In this case the English
teacher also made the good planning of the index difficulty of the English
summative test in SMPN 1 Ciputat. Because of that the writer is very curious to
find out whether the planning of good index difficulty has the same result after the
student do the test.
Finally, in this research paper the writer want to try to do an item analysis on
the index difficulty of English summative test in SMPN 1 Ciputat. An item
analysis needs to do in order to get the items of questions that have the good
quality. Of course an items that can give the clear information about the reality of
the students’ ability. The item of the question which is not being able describes
the real student’s ability. It means that the item that is given is useless; it does not
have the function well. So the writer thinks that an item analysis is important to
do.
Index difficulty is very important to be analyzed because if the test does not
have a good index difficulty so the teacher can revise the test items. The next test
would be better than the previous one. Then we can say that the evaluation tools
really can evaluate what need to be measured.
If the test items are too easy it can cause the testee or the students able to
answer all of the questions correctly and we call these items as the bad items. If
there are too easy item it will not motivate the students. Students who are in the
lower group will think that they are good enough, because they think they have

success in answering the question correctly. And for those who are in upper group
will not be challenged in doing the test and they will underestimate the test.
If the test items are too difficult so the all students will not be able to answer
correctly. It means the test items cannot measure the student’s ability as a whole.
In order the test items can measure what students already know and not yet so the
tester or teacher in making the test items must consider the level of index
difficulty by predicting the students who can answer correctly and the total
students who cannot answer correctly.
The good test items are the item which consists of easy, medium and
difficult index. The easy item functions to give the chance for the students to
answer. The difficult items function to challenge the smart student to answer. So
the test item really can function as the good evaluation tools.
Based on the explanation above, the items analysis on the index difficulty of
this test will be limited on English summative test for the eight grade of SMPN 1
Ciputat. This research is limited on the English test because the English subject is
very important in this era of globalization. So the result of students’ learning
English needs to be evaluated. It is important to know what have been known or
not by the student which is functioned to measure it need an accurate evaluation
tools.
The writer limited the research on the summative test because the writer find
out that most of the multiple choice item questions of English summative test can
be answered by the sudents. So it is interesting to be analyzed what the test items
looks like so most of the students can answer it correctly.
This research is limited on the eight grades because the main material of
senior high school is taught in eight grade where in the seven grade the material of
subject is given only as the introduction. Beside that the eight grades is the middle
grade where the seventh grade is too far away as the predictive tools to predict the
UN result. The ninth grade is too close with UN. So the writer would rather to
choose the eight grades in order the research can help teachers to make the items
of summative test in the second grade material while predicting the index

difficulty of the types of items which is close or almost the same as UN test. And
also this research can help the teachers to identify which students need more
attention.
The writer limited the research in SMPN 1 Ciputat because SMPN 1 Ciputat
is besides trained by the time, it has built since 1974. It also has some great
academic achievement such as the ninth rank of general knowledge competition/
Lomba Mata Pelajaran (LMP) Umum in JABODETABEK, the second rank of
English competition / Lomba Mata Pelajaran Bahasa Inggris in Tangerang
regency, and the second rank of story telling competition in Tangerang regency.
Besides that most of its graduation has been accepted in favorite senior high
school in Jakarta. Based on this fact the writer is very interested to know how
good the test items that have been made by the English teacher in SMPN 1
Ciputat which is created a good academicals achievement and can support its
graduation to be accepted in favorite senior high school.

B. Research question
Does the English Summative test in SMPN 1 Ciputat has the balance degree
of index difficulty such as easy, moderate and difficult item?

C. The scope of the study
The study of item analysis covered three things such as the index difficulty,
discriminating power, and the effectiveness of the distracters, but in this research
paper the writer will only focus the discussion on the index difficulty of English
Summative test of the second grade in SMPN 1 Ciputat.

D. The purpose of the study
To analyze whether the English Summative test in SMPN 1 Ciputat for the
second grade students in SMPN 1 Ciputat has a good index difficulty.

E. The significant of the study

This study is important to do because the results of this study can give the
impact to the school, society, and further researcher. And also this study provides
useful information especially for the school SMP Negeri 1 Ciputat and for other
school generally. To support the discussion, the writer takes field research by
observing the English Summative test paper and the students’ answer sheet to
measure the quality of the index difficulty of the test items. The writer also does
the library research by reading some reference book and literatures related to the
topic of discussion.

F.

The Review of Related Literature
Index difficulty of the test item is very important to check the quality of the

test item in testing someone’s capability. Item analysis of index difficulty is
important in order the result of the test really can evaluate the student’s
achievement as a whole. The same research of item analysis on the index
difficulty has been done by Yusuf Supriyanto under the title “An Item Analysis on
Difficulty Level of English Try out Test for “UN” at Junior High School Level (A
Case Study at SMPN 2 Ciputat Tangerang)”. The second research has been done
also by Rogibatul Lutphiyah with the research paper “An Analysis on Difficulty
Level of English National Try Out Test 2008 (A Case Study at the Third Year of
State Senior High School 87 Jakarta)”. This research paper gives a description that
English National Try Out Test items belongs to the test item which have moderate
level of difficulty.
The advantage of their research is by focusing the research on the interesting
issues’ scale which is talked about items analysis of the Try Out on the national
level. Unfortunately the sample taken is only about 25% so the result of the
calculation might be not presented the entire participant of 100% percent and it is
not really accurate. Besides that, the research of item analysis can be not really
useful because the test maker is made by the expert of the test maker itself. It will
be different if the research is done for a test which is made by the teacher in the
particular school, it will be more useful for the teacher as the test maker and be a

motivator for the teacher to increase their ability in making the test quality higher
than before.
Based on the fact of the advantages of doing research on the teacher made
test above, in the same field of research, that is item analysis, the writer will try to
do an item analysis on English summative test in the second grade of SMP Negeri
1 Ciputat. In this research, the usage of the data in the research is not only more
than the previous research but will focus the research will focus on item analysis
on the index difficulty. In this research, the researcher does not only focus on
identifying the items which is belong to easy, moderate, and difficult category but
also try to identify whether the proportion for easy, moderate, and difficult
category is balance or not. The table 1 below will show the different of the
previous research index difficulty of test item and the writer index difficulty
research.
Table 1.1
The different of the previous research index difficulty of a test item and the
writer index difficulty research.
The previous

The writer ‘s

research

research

The researcher only identifies the item The researcher does not only identify
that belongs to easy, moderate, and which items that belongs to easy,
difficult category.

moderate, and difficult category but
also try to find whether the proportion
for easy, moderate, and difficult
category is balance or not.

Based on the explanation above the writer as the one who is not involved in
the committee of this test and be an independent researcher wants to analyze
whether the test item of English summative test in SMPN 1 Ciputat really has the
good index difficulty in order the test items can be function well. The concrete
contribution that the writer can give of doing this research is to show the teachers

and the students in SMPN 1 Ciputat about the index difficulty of the test items of
the summative test from the general society point of view.

CHAPTER II
THEORETICAL FRAMEWORK
1.

Introduction
In education there is a learning process that has to be experienced by the

students. This learning process consists of three main elements:
1) Objective, tells about what worth teaching or learning is.
2) Methods, tell about how to teach or to learn and how to choose the best
method to be taught.
3) Evaluation tells about how well it has been taught.1
A teaching learning process has instructional objectives that have to be
reached through materials developed in the classroom. Such process employs
effective and appropriate matter, and to know whether instructional objectives and
the effectiveness of method in teaching learning process, evaluation must be
conducted.
Based on some definitions above, the writer can conclude that the process of
teaching and learning also has three steps: presentation, practice, /drills, and
comprehension.2 a good teacher has to present or explain the material as clear as
possible and check the students’ understanding by giving several exercise or
drills. To get the students understanding or comprehension more accurately, the
teacher may continue to the last step, that is evaluation. So, evaluation has the
function to check the result of presentation and practice.
2.

Evaluation
Weiss a quoted by Lyle F. Bachman states that, “evaluation can be defined

as the systematic gathering of information for the purpose of making decision”.2
8
According to M. Ngalim Purwanto: “Penilaian
adalah suatu proses yang sengaja
direncanakan untuk memperoleh informasi atau data berdasarkan data tersebut

1

Nasrun Mahmud, Diktat Language Testing II, p. 1
Lyle F. Bachman, Fundamental Considerations in Language Testing, 9Oxford: Oxford
University Press, 1990), p. 22
2

kemudian di coba membuat suatu kepututusan”.3 Evaluation is a system of quality
control in which it may be determine at each step in the teaching-learning process
whether the process is effective or not. And if not what changes must be made to
ensure its effectiveness before it is too late.4
Evaluation can also be used to find out the failure the effectiveness of the
method and the improvement of teaching-learning process, based on the statement
of Benjamin S. Bloom, etc. Al: Evaluation is an important activity for the teachers
and the students. Although the focus in the evaluation phase is on student selfevaluation, teachers are also engaged in evaluation activities. In order to know
how well the result of teaching and learning process, a teacher must evaluate it.
By evaluating the teacher can collect information or can have a picture describing
how well a teaching and learning activity has succeeded. Every evaluation activity
or judgment constitutes any process, which is deliberately planned to get, make
and provide data for making decision. We know that through evaluation the
teachers could find the information about the growth and the progress of the
student’s achievement of the material they have learned in order to make decision.
Evaluation cannot be separated from teaching and learning process.
Based on some definitions above, the writer can conclude that evaluation is
considered that between teachings and testing is like a two side of coin, which
cannot be separated. It should be held in teaching-learning process. It can be used
to improve the teaching and learning activities which are done by the teacher and
the students. In the writer’s opinion evaluation is the measurement of students’
achievement after the process of teaching and learning. Through evaluation the
result of the learning process can be seen whether it is success or not.

3.

Test

3

M. Ngalim Purwanto, Prinsip-Prinsip dan Teknik Evaluasi Pengajaran, (Bandung:
remaja Rosdakarya, 1991), p. 3
4
Benjamin S. Bloom, et. Al., Handbook on Formative and Summative of Student
Learning, (London: Longman, 1971), p.8

Actually, there are many methods that the teacher can use in evaluating. One
of them is test.5 Before the writer talks more about the test, she would like to
elaborate the definition of it. Test is the word “Test” comes from French language,
namely “testum” and its meaning is vessel in which metals were assayed,
potsherd.6 Jum C. Nunnally (1964) said, “A test is a standardized situation that
provides an individual with a score”.7 Lee J. Cronbach also says that test is a
systematic procedure used to compare a behavior of two people or more.8 Then
according to Anne Anastasi, a test is essentially an objective and standardized
measure of a sample of behavior.9 wayan Nurkanca and P. P. N. Sumartana also
express his comment that: “Tes adalah suatu cara untuk mengadakan penelitian
yang berbentuk suatu tugas yang harus dikerjakan oleh anak atau kelompok anak
sehingga menghasilkan suatu nilai tentang tingkah laku atau prestasi anak tersebut
yang dapat dibandingkan dengan nilai yang dicapai oleh anak lain sehingga
dengan nilai standar”.10 The expert of evaluation Fred Genesee and John A.
Upshur state, “a test is a task or set of tasks that elicits observable behavior from
the test taker and tests yield scores that represent attributes or characteristics of
individuals.11
A test maker uses a test obviously to obtain information. The information
that she/he hopes, will of course vary from situation to situation. It is possible,
nevertheless, to categorize tests according to a small number of kinds of
information being sought. This categorization will prove useful in deciding
whether an existing test is suitable for an intended purpose. Fred Genesee and

5

Fred Genesee & John A. Upshur, Classroom-Based Evaluation in Second Language
Education, New York: Cambridge University Press, 1996, p.140
6
Merriam Webster’s Dictionary & Thesaurus (Electronic Dictionary)
7
Jum C. Nunnally, Educational Measurement and Evaluation, New York: McGraw-Hill
Book, 1964, p. 6
8
Lee J. Cronbach, Essentials of Psychological Testing, New York: Harper & Row Publisher,
1960,
9
Anne Anastasi, Psychological Testing, New York: Prentice Hall Inter. Inc., 1997, p.4
10
Wayan Nurkancana dan P. P. N. Sumartana, Evaluasi Pendidikan, (Surabaya: Usaha
Nasional, 1986), p.25
11
Fred geneses and John A. Upshur, Classroom-based Evaluation in Second Language
Education, (New York: Cambridge University Press, 1996), p. 141

John A. Upshur said:12A test is, first of all, about something. That is about
intelligence, European history, or second language proficiency. In educational
terms, tests have subject matter or content. Second, a test is a task or set of tasks
that elicit observable behavior from the test taker. The test may consist of only
one task such as writing a composition, or a set of tasks such as in lengthy
multiple-choice examination in which each question can be thought of as a
separate task. Third, test yield scores that represent attributes or characteristics of
individuals. In order to be meaningful, test scores must have a frame of reference
used to interpret them is referred to as measurement. Thus, tests are a form of
measurement.
Based on some definitions above, the writer can conclude that the test is a
systematic procedure or a standardized situation to measure or compare a behavior
of two people or more by giving a task or a set of tasks or a systematic procedure
for providing information about the student’s achievement that is relevant in
teaching-learning process. In the writer’s opinion is test is one of the good ways to
measure the student’s achievement after the teaching and learning process. It is
also one of good instrument for collecting data which is later used for making
various decisions about the student’s behavior or achievement in the teachinglearning process.

4.

Types of test
Based on that explanation, the writer here will discuss kinds of tests. Arthur

Hughes classifies a test into four. Those are achievement test, proficiency tests,
diagnostic test, and placement test.13 Marry Finnachiaro and Sydney Salo also
classifies a test into four: achievement tests, proficiency tests, diagnostic tests, and
aptitude tests.14 While, Wilmar Tinambunan says that there are two types of test

12

Freed Genesee and John A. Upshur, Classroom-Based Evaluation in Second Language
Education, New York: Cambridge University Press, 1996, p. 141
13
Arthur Hughes, Testing for …p. 9
14
Mary Finnachiaro and Sydney Salo, Foreign Language … p. 22

used in determining a person’s abilities: aptitude tests and achievement tests.15
Based on some definitions above, the writer can conclude that there are four types
of test such as achievement tests, proficiency tests, diagnostic tests, and aptitude
tests.
a.

Achievement test
An achievement test related directly to classroom lesson, units, or even a

total curriculum. Achievements tests are limited to particular material covered in a
curriculum within a particular time frame and are offered after a course has
covered.16 Achievement test are related to the past in what they measure, what the
students have learned as a result of teaching.17
The purpose of achievement test, as its name reflects, is to establish how
successful individual students, groups of students, or the courses themselves have
been in achieving objectives of language courses.18 Wilmar Tinambunan19 says
that an achievement test is designed to indicate degree of student’s success in
some past learning activities. This purpose of achievement test is obviously
different from the purpose of aptitude test, where the aptitude test is designed to
predict success in some future learning activity.
Based on some definitions above, the writer can conclude that achievement
tests are intended to measure how well students have mastered lessons or how far
they have reached the instructional objectives, consequently, the content of both
of final and progress achievement tests should be designed related to instructional
objectives.20 In the writer’s opinion achievement test are also related to the past in
what they measure, what the students have learned as a result of teaching.
b.

Proficiency test

15
Wilmar Tinambunan, Evaluation of Student Achievement, Jakarta: Dept. P&K Dirjen.
Pendidikan Tinggi Proyek Pengembangan Lembaga Pendidikan Tenaga Kependidikan, 1998, p. 9
16
H. Douglas, Teaching by Principle, An Interactive Approach to Language Pedagogy, San
Fransisco; Addison Wesley Longman, 2001, p. 391
17
Tim Mc. Namara, Language Testing, Hongkong: Oxford university Press, 2000, p.7
18
Arthur Hughes, Testing for …p. 10
19
Wilmar Tinambunan, Evaluation of Student …, p. 9
20
Arthur Hughes, Testing for …p. 13

Proficiency test is a test that can measure people’s ability in language
regardless of any training they may have had in that language. The proficiency
test usually consists of standardized multiple-choice items on grammar,
vocabulary, reading comprehension, aural comprehension, and sometimes on
writing.21
The purpose of proficiency test is to measure people’s ability in language
regardless of any training they may have had in that language. The content of
proficiency tests therefore, is not based on the content or objective of language
courses that people taking the test may have followed. Rather, it based on a
specification of what candidates may have to be able to do in the language in
order to be considered proficient.22 These measures are often used for placement
or selection, and their relative merits lies in their ability top spread students out
according to ability on a proficiency range within the desired area of learning.
Based on some definitions above, the writer can conclude that proficiency
tests are most often global measures of ability in a language or other content area.
They are not necessarily developed or administered with the reference to some
previously experienced course of instruction. In the writer’s opinion is to check
how far someone’s capability in mastering some skill that is needed to be
measured.
c.

Diagnostic test
A diagnostic test is a test that can find the appropriate way to improve

learning instruction where the result of evaluation is intended if pupils fail in a
particular subject. Thus, diagnostics test is much comprehensive and detail
because it searches for the underlying causes of learning difficulties and formulate
a plan for remedial action.
The purpose of the diagnostic test is to diagnose a particular aspect of a
language. A diagnostic test in pronunciation might have the purpose of

21

Grant Henning, A Guide to Language Testing, Massachusetts: Newbury House Publishers,
1987, p. 6
22
Arthur Huges, Testing for Language Teacher, Melbourne. Cambridge University Press,
1991, p. 9

determining which phonological features of English are difficult for a learner.23
Based on some definitions above, the writer can conclude that diagnostic test is
needed to know someone’s ability in a particular aspect of a language. In the
writer’s opinion diagnostic test is to check in what area is someone strong in
certain skills.
d.

Aptitude test
Aptitude tests are most often used to measure the suitability of a candidate

for a specific program of instruction or a particular kind of employment. For this
reason, these tests are often synonymously with intelligence tests or screening
tests.
The purpose of aptitude test is to A language aptitude test is designed to
measure a person’s capacity or general ability to learn a foreign language and to
be successful in that undertaking.24 Based on some definitions above, the writer
can conclude that the aptitude test is used to measure someone’s ability in
achieving the learning in foreign language. In the writer’s opinion the aptitude test
is needed to evaluate someone’s capability in achieving what they have learnt
about a particular language. The table 2 below will show the differences of the test
types.

Table 2.1
The differences of the test types
Achievement test

23
24

Proficiency test

H. Douglas, Brown op.cit. p. 390
Ibid

Diagnostic test

Aptitude test

Achievement test

Proficiency test

A diagnostic test

Aptitude test is a

are related to the

is a test that can

is a test that can

test that are most

past in what they

measure

find the

often used to

measure, what the

people’s ability

appropriate way to measure the

students have

in language

improve learning

suitability of a

learned as a result

regardless of

instruction where

candidate for a

of teaching

any training

the result of

specific program

they may have

evaluation is

of instruction or

had in that

intended if pupils

a particular kind

language

fail in a particular

of employment

subject

5.

Types of test item
The types of test item are divided into two types: Multiple choice and essay

test. Julian C. Stanley stated,
1.

Multiple-choice test
“A multiple-choice test is made up of items each of which presents two or

more responses, only one of which is correct or definitely better than the others. 25
Then Anthony J. Nitko states that: a multiple-choice item consists of one or more
introductory sentences followed by a list of two or more suggested responses from
which the examinee chooses one as the correct answer.26
The purpose of multiple-choice test is to know how well the person good in
eliminating the wrong answer. Multiple-choice test is a kind of the objective test,
which also can be used to measure the students’ English knowledge. Multiple
choice test is usually regarded as the most valuable, most useful, and also the most
applicable of all test form. It is very effective for measuring information, students’
reading, students’ vocabulary, etc

25

Julian C. Stanley, Measurement in Today School, Fourth edition, (New Jersey Prentice Hall,
Inc., 1964), p. 121
26
Anthony J. Nitko, op. cit., p. 190

Based on some definitions above, the writer can conclude that multiple
choice tests is a test, which consists of two major points, they are the stem and the
options. The stem is a question that is presented in the form of a direct question or
incomplete statement. And the options are the alternatives variation of the item
types or the alternatives answer. From some options in question only one that is
correct. In the writer’s opinion multiple-choice test is to know how well the
person good in eliminating the wrong answer.
2.

Essay test
Drs. Wilmar Tinambunan wrote in his book namely evaluation of the

students’ achievement that “The essay item is the most complex of supply type
item. It demands that the students compose a response, often extensive to a
question for which no single response or pattern of response can be cites as
correct to the exclusion of are the answer. Essay test is usually called as a
subjective test. Why it can be called as a subjective test? In the essay test we will
find the subjectivity of the teacher in the scoring. This subjectivity is caused by
the variation of the students’ answer. So, the level of correctness and the error of
students’ answer also varieties.
Based on some definitions above, the writer can conclude that the essay test
above, it can be concluded that the essay is a type of question that demands the
examinee to select the answer and write it by their words. In the writer’s opinion
an essay test is used to measure someone’s ability in expressing their knowledge
about information.

6.

The characteristic of good test
A test can be regarded as a good one, if it has four characteristics; those are

validity, reliability, objectivity, and practicality.27

a.

Validity

27

Anas Sudijono, Pengantar Evaluasi … p. 93

JB. Heaton said, ‘The validity of a test is the extent to which it measures
what it is supposed to measure and nothing else”.28 The validity of a test must be
considered in measurement in this case there must be seen whether the test used
really measures what are supposed to measures, briefly. The validity of a test is
the extent to which the test measures what it is intended to measure. There are
four types of validity:

a.

Face validity
Face validity means the way the test looks to the testiest, teachers,

moderators, and administrator. Therefore it is useful to show a test to colleagues
or friends in order to discover absurdities and ambiguities of a test.
b.

Content validity
Content validity is considered with the material that the students have

learned. The test should concern with the materials that the students have learned.
The test should cover samples of the teaching materials given. To fulfill this, the
teacher should refer his consideration to the teaching syllabus. JB. Heaton says
“Content validity depends on careful analysis of the language of the language
being tested and of the particular course objectives, the test should be so
constructed as to contain a representative sample of the course.
c.

Construct validity
Construct validity deals with construct and underlying theory of the

language learning and testing. JB. Heaton states, ‘If the test has construct validity
it is capable of measuring certain specific characteristics in accordance with a
theory of language and behavior and learning.
d. Empirical validity
There are two kinds of empirical validity: concurrent validity and predictive
validity which depend on whether the test scores are correlated with subsequent or
concurrent criterion measures.

28

JB. Heaton, Writing English Language Test, (Longman 1998), p. 153

If we use a test of English as a second language to screen university
applicants and then correlate test scores with grades made at the end of the first
semester, we are attempting to determine predictive validity of the test. If, on the
other hand, we follow up the test immediately by having an English teacher rate
each student’s English proficiency on the basis of his class performance during
the first week and correlate the two measures, we are seeking to establish the
concurrent validity of the test.29
Based on some definitions above, the writer can conclude that Validity of a
test is important to know whether a test has a good quality in testing someone’s
capability. In the writer’s opinion validity of a test is important to know whether a
test has a good quality in testing someone’s capability or not.
d.

Reliability
The second characteristic of a good test is reliability.
A test should be reliable as a measuring instrument. A test cannot measure

anything well unless it measure consistently. According to J. Charles Alderson,
Caroline Clapham and Dianne Wall, “a test cannot be valid unless it is reliable”.30
Reliability or stability of a language test is concerned with the degree to
which it can be trusted to produce the same result upon repeated administration to
the same individual or to give consistent information about the value of a learning
variable being measured.31 Therefore, to be considered reliable, a language test
must obtain consistent results and give consistent information.
Based on some definitions above, the writer can conclude that If the test
administered to the same students on the different occasion and there is no
different to the results. It can be said that the test is reliable. In the writer’s
opinion reliability of a test is very important in order the test can really function as
its purpose to measure someone’s capability as a whole.
e.

Practicality
The forth characteristic of a good test is practicality.
29

Ibid., p. 154-155
J. Charles Anderson, Caroline Clapham and Dianne Wall, Language Test Construction and
Evaluation, (British: Cambridge University Press, 1995), p. 187
31
Mary Finnachiaro & Sydney Salo, Foreign Language … p. 28
30

Practicality means that tests should be practical in administrating it. The
criteria for practicality normally will be based upon such factors as economy,
scorability, and administrability. Economy means that the test should be as
economical as possible in cost. Scorability means that scoring of the test can be
done easily effectively, without giving a confusing matter and spending more
time. Adminstrability means that the test should be easy to administer for the
examinees. There are several factors contribute to adminstrability. First, there
should be a training session for test administrators, because it will facilitate the
operation and save time an effort later on. Second, test instructions should be clear
and concise, and yet totally comprehensible and complete.32
Based on some definitions above, the writer can conclude that practicality of
a test is important in order test can be administered well. Ease of administration
and scoring means that the test administrator can perform his task quickly and
efficiency. We must also consider the ease with the test can be administered. In
the writer’s opinion the practicality of a test is important in order test can be
administered well.

7.

Technique to measure the good test
After doing the evaluation, it is better for the teacher to evaluate or

reexamine the test result, especially test items. The procedure that is usually called
item analysis.
a.

Item analysis
J. Stanley Ahman and Marvin D. Glock explained that, “item analysis is

reexamining each test item to discover its strengths and flaws.33 Furthermore,
Anthony J. Notki stated that “item analysis refers to the process of collecting,
summarizing, and using information about individual test items, especially
information about pupil