An analysis on the difficulty level of english summative test for second grade of junior high schoolat odd semester 2010/2011
(A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)
A “Skripsi”
Presented to the Faculty of Tarbiya and Teachers’ Training
in a Partial Fulfillment of the Requirements for the Degree of S.Pd. (Bachelor of Arts) in English Language Education
Written By:
Andrian Dwi Prayoga 107014000882
ENGLISH EDUCATION DEPARTMENT
FACULTY OF TARBIYA AND TEACHERS’ TRAINING
“SYARIF HIDAYATULLAH” STATE ISLAMIC UNIVERSITY JAKARTA
(2)
(A case study at the second Grade of sMp Negeri 13 Tangerang selatan) A "Skripsi"
Presented to the Faculty of Tarbiyah and Teachers' Training in partial Fulfillment of the Requirements
For the Degree of s. Pd. (Bachelor of Arts) in English Language Education
ANDRIAN DWI PRAYOGA NIM: 107014000882
Approved by: Advisor,
DEPARTMENT OF ENGLISH EDUCATION
FACULTY OF TARBIYAH AND TEACHERS' TRAINII\G
SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY
JAKARTA
20ll
H. Nluh^mrfr-ad Farkha NIP. 19571005 198703 1 003
(3)
DIFICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH SCHOOL AT ODD SEMESTER 2O1Ol2011 (A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)", written by Andrian Dwi Prayoga, student's registration number 107014000882, and was examined by the committee on December l5th, 2011, and was declared to have passed and, therefore fulfilled one of the requirements for the academic title of 'S. Pd.' (Bachelor of Arts) in English Language Education at Department of English Education.
Jakarta; December 15tn 2011, EXAMINATION COMMITTEE
Date Signature
CHAIRMAN : Drs. Syauki. M.Pd.
NIP. 19641212 t99t03 1002
SECRETARY : Neneng Sunengsih. M.Pd. NIP. 19730625 199903 2 001
EXAMINERS : 1. Drs. A.M. Zaenuri. M.Pd. NrP. 19530304 197903 1 001
W
&
[),
,gLe"'LUb
-t-\5/ h
L_!&_---)
4,/\'L
(/'r__--J
t1/ 't 2. Nida Husna. M.Pd." M.A.TESOL t-gjNrP. 1972070s 200312 2 002
Acknowled gedby:7
Dean of Tarbiya and Teachers' Training Faculty
Nurlena Rifa'i" M.A.. Ph.D. NIP. 19591020 198603 2001
(4)
Saya yang bertanda tan N a m a
Tempat/Tgl.Lahir NIM
Jurusan / Prodi Judul Skripsi
AN ANALYSIS ON THE DIFFICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH
SCHOOL AT ODD SEMESTER 2O1Ol2011
(A case Study at the Second Grade of SMp Negeri l3 Tangerang Selatan) Dosen Pembimbing
dengan ini menyatakan bahwa skripsi yang saya buat benar-benar hasil karya sendiri cjan saya bertanggung jawab secara akademis atas apa yang saya tulis.
gan di bawah ini, Andrian DwiPrayoga Jakarta, 2 Januari 1990
I 0 7 0 1 4 0 0 0 8 8 2
(5)
i
Semester 2010/2011 (A Case Study at the Second Grade of SMPN 13 South Tangerang), Skripsi, English Education Department, Faculty of
Tarbiya and Teachers’ Training, Syarif Hidayatullah State Islamic
University Jakarta.
Key words: Item Difficulty Level, Summative Test
This study is purposed to measure the difficulty level of the English summative test items, tested for the second grade of SMPN 13 South Tangerang at odd semester academic year 2010/2011. Through this study, it can be known which one of the test items is too easy, moderate, and difficult.
This study is included in quantitative research because the researcher uses some numerical data which are analyzed statistically. Also, this study is categorized as descriptive analysis because it is intended to describe the objective condition about the difficulty level of the English summative test for the second grade of SMPN 13 South Tangerang at odd semester academic year 2010/2011.
The findings of this study are that moderate items have highest percentage with 66,7% followed by difficult items with 20% and easy items with 13,3%. Overall, the difficulty level of the test is in moderate level with 0.50. Therefore, this test has a good difficulty level.
The result of this item analysis research can be used by the teachers to revise the test items categorized as either easy or difficult items. It also can give the information about which material or problem the teachers should focus more on in the classroom in order to make the students ready for the next exam.
(6)
ii
Semester 2010/2011 (A Case Study at the Second Grade of SMPN 13 South Tangerang), Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif Hidayatullah Jakarta.
Kata kunci: Tingkat Kesulitan Butir Soal, Tes Sumatif
Penelitian ini bertujuan untuk mengukur tingkat kesulitan butir-butir soal dari tes sumatif bahasa Inggris yang diujikan untuk kelas dua SMPN 13 Tangerang Selatan pada semester ganjil tahun ajaran 2010/2011. Dengan penelitian ini, dapat diketahui butir soal mana saja yang terlalu mudah, sedang dan sulit.
Penelitian ini termasuk dalam penelitian kuantitatif karena peneliti menggunakan beberapa data numerik yang dianalisis secara statistik. Penelitian ini juga dikategorikan sebagai analisis deskriptif karena penelitian ini menggambarkan kondisi objektif mengenai tingkat kesulitan tes sumatif bahasa Inggris untuk kelas dua SMPN 13 Tangerang Selatan pada semester ganjil tahun ajaran 2010/2011.
Hasil dari penelitian ini adalah bahwa soal yang sedang memiliki persentase yang paling tinggi dengan 66,7% diikuti oleh soal sulit sebesar 20% dan soal mudah sebesar 13,3%. Secara keseluruhan, tingkat kesukaran soal ini berada pada tingkat sedang dengan 0.50. Oleh karena itu, tes ini memiliki tingkat kesukaran soal yang baik.
Hasil dari penelitian analisa butir soal ini dapat digunakan oleh guru untuk memperbaiki butir soal yang termasuk kategori soal mudah dan soal sulit. Penelitian ini juga dapat memberikan informasi mengenai materi atau masalah mana saja yang perlu mendapat perhatian lebih dari guru dalam proses belajar mengajar untuk mempersiapkan siswa dalam menghadapi ujian selanjutnya.
(7)
iii
and Blessing to the writer, so that this “Skripsi” can be finished completely. Peace
and Salution be upon our prophet Muhammad, his families, companions, and his followers.
The writer would like to express his gratitude to Mr. Dr. H. Muhammad
Farkhan, M.Pd. as the writer’s advisor who had kindly spent his time to give his
valuable advice, guidance, corrections, and suggestions in composing this
“Skripsi.”
Also, on this occasion, the writer would like to express his greatest appreciation, honor, gratitude and love to his beloved mother, Mrs. Tri Hastuti, S.Pd., who has been a great motivator in every condition, and also to his father Mr. Juraid Umar, M.Pd., who has given him many inspirations. He thanks to them for their pray, guidance, patience, and encouragement to motivate the writer to finish his study.
The writer would like to express his highest appreciation and gratitude to all lecturers of English Education Department, for teaching the precious knowledge, sharing the values of life and giving the unforgettable study experinces.
The writer dedicates many thanks to Mr. Rohman, S.Pd. as the Headmaster of
“SMPN” 13 South Tangerang, who had given the permission to the writer to do
the research there. Also, his gratitude is sent to Ms. Dahlia Muflikhati, S.Pd. as
one of English teachers in “SMPN” 13 South Tangerang who had given the writer
great contribution and corporation while he was doing this research.
His gratitude also goes to Mr. Drs. Syauki, M.Pd. as the Head of English Education Department, Ms. Neneng Sunengsih, S.Pd. as the Secretary of English Education Department. Also, his thanks is given to the staffs of English Education Department, specially for Ms. Aida Ainul Wardah, S.Pd. who always gives excellent service and contribution to the writer.
The writer would like to express his thanks and love to all his beloved friends, especially for Ayu Lestari, Marinta, Eva Nur, Maya K., Syifa F., Fera P., Wilda
(8)
iv
while studying together.
Finally, the writer realizes that this “Skripsi” is still far from being perfect.
Constructive criticism and suggestion would be welcomed to make it better.
Jakarta, November 2011
(9)
v
ABSTRAK... ii
ACKNOWLEDGEMENT... iii
TABLE OF CONTENTS... v
LIST OF TABLES... viii
CHAPTER I : INTRODUCTION... 1
A. Background of the Study... 1
B. Limitation of the Study... 4
C. Statement of the Problem... 4
D. Objective of the Study... 5
E. Significance of the Study... 5
F. Method of the Study... 5
CHAPTER II : THEORETICAL FRAMEWORK... 6
A. Test... 6
1. Definition of Test... 6
2. Types of Test... 7
a. Achievement Test... 7
1. Placement Test... 8
2. Formative Test... 9
3. Diagnostic Test... 10
4. Summative Test... 11
b. Proficiency Test... 11
c. Progress Test... 12
d. Aptitude Test... 12
B. Categories of Good Test... 13
1. Validity... 13
a. Content Validity... 14
(10)
vi
2. Reliability... 15
3. Practicality... 16
C. Types of Test Item... 17
1. Objective Test... 17
a. Selection-Type Test Item... 18
1. Multiple Choice... 18
2. True-False... 21
3. Matching... 23
4. Rearrangement... 24
b. Supply-Type Test Item... 25
1. Short-Answer... 25
2. Fill-in... 27
2. Essay Test... 29
D. Item Analysis... 31
1. Definition of Item Analysis... 31
2. Kinds of Item Analysis... 32
a. Level of Difficulty... 33
b. Discriminating Power... 36
c. The Effectiveness of Distractors... 38
E. The Importance of Item Analysis... 38
CHAPTER III : THE IMPLEMENTATION OF THE RESEARCH... 41
A. Research Methodology... 41
1. Purpose of the Study... 41
2. Place and Time of the Study... 41
3. Population and Sample... 41
4. Method of the Study... 42
(11)
vii
1. Data Description... 44
2. Data Analysis... 47
3. Data Interpretation... 52
CHAPTER IV : CONCLUSION AND SUGGESTIONS... 53
A. Conclusion... 53
B. Suggestions... 53
BIBLIOGRAPHY... ix APPENDIXES
(12)
viii
Table 2. The Group Position Based on the Test Result... 45 Table 3. Format of Item Analysis of the English Summative Test... 48 Table 4. Classification of Items Based on the Proportion of Difficulty Leve...49 Appendix
Table 5. Students’ Answer in the Upper Group (Multiple-Choice Items)
Table 6. Students’ Answer in the Lower Group (Multiple-Choice Items)
(13)
1
CHAPTER I INTRODUCTION
This chapter discusses and presents background of the study, limitation of the problem, statement of the problem, objective of the study, significance of the study, and method of the study.
A. Background of the Study
Evaluation plays an important role in every stage of education. It is integrated in the school program so it contributes directly to the teaching and learning process. According to Norman E. Gronlund, “Carefully collected evaluation data help teachers understand the learners, plan learning experiences for them, and determine the extent to which the instructional objectives are being achieved.”1
Evaluation refers to the process of making conclusion from a study of data gathered to describe value judgments about student’s performance. Lyle F. Bachman quotes that, “evaluation can be defined as the systematic gathering of information for the purpose of making decisions.”2 In summary, evaluation takes the very important role because it is a must for teachers to always concern with the quality of their instructional process and whether students have reached the instructional goals which have been stated before.
1
Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan
Publishing Co., Inc., 1981), 4th Ed., p. 3.
2
Lyle F. Bachman, Fundamental Considerations in Language Testing, (Oxfrod: Oxford
(14)
There are many ways for collecting data as information in the process of evaluation. One of them is by using a test. A test is a set of question, each of which has a correct answer, which examinees usually answer orally or in writing.3 There are several types of the test. One of them is achievement test which is designed to know how successful student has mastered the knowledge, abilities, and skills in the past learning activity.
According to Wilmar Tinambunan, there are four types of achievement test which are commonly used. First, a placement test is done at the beginning
of learning to know student’s early performance. Next, a formative test is
used to monitor student’s progress during the learning process. Third, a
diagnostic test is intended to detect student’s weaknesses during instruction.
Finally, a summative test is used to show the standard that students have reached in relation to other students at the same stage.4 In this research, the test that the writer would like to analyze is the summative test.
As one of methods to measure students’ achievement in learning process, a test should be well constructed. A well constructed test should have three main characteristics which involve validity, reliability, and practicallity. Valid in language testing means that how the test really evaluates what we actually want to measure. Whereas, reliability means that a test has to be consistent and reproducible. While, practicallity is concerned with a wide range of factors of economy, convenience and interpretability.5
Making a well constructed test is the teachers’ responsibility because they are the ones who know their students’ capability and the instructional objective itself. However, it is not an easy job. Some tests are built in a careless way. As stated by J. Stanley Ahmann and Marvin D. Glock,
“Classroom tests are tests constructed by classroom teacher for use in his
3
Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: Depdikbud, 1988), p. 3.
4
Wilmar Tinambunan, Evaluation of Students ...p. 7-9.
5
Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology and
(15)
particular classes. More of these tests are administrated than any other kind.
Unfortunately, they are carelessly constructed and interpreted.”6
Based on the explanation above, teachers need to evaluate the effecetiveness of the test items because it is necessary for teachers to know whether the test items work well or not. Meanwhile, Harold S. Maiden
explains that “the selections of appropriate language items are not enough by
itself to ensure a good test. Each question needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather simple statistical ways of checking individuals’ items. This procedure is called as item analysis.7 This is done by analyzing the students’ response to each item.
Items analysis of a test can be a valuable activity that can improve the
test’s reliability and validity. Items analysis procedures provide information
for evaluating the functional effectiveness of each item and for detecting weakness, which should be corrected. This information is useful when reviewing the test and it is indispensable when building a set of high quality items for the next test.
Items analysis has three main components; they are level difficulty, discriminating power, and effectiveness of the distracters. The difficulty level procedure provides data how many percentages of students who answer an item correctly. Discriminating power means whether the test can discriminate
the students’ ability or not. The last one means whether all the alternatives of
items function well or not.
The writer limits the problem of the study that he will discuss; he only focuses on the difficulty level of the test. The test should have the difficulty level whether it is included as easy, moderate, or difficult test. Besides, he needs to analyze how many percentages of items which are easy, moderate, and difficult. Moreover, it is able to distinguish between the students who have studied well and those who have not.
6
J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth Principles of Tests and
Measurement, (Boston: Allyn and Bacon, Inc, 1967), p. 17.
7
Harold S. Madsen, Technique in Testing, (New York: Oxford University Press, 1983), p. 180.
(16)
The writer intends to analyze the difficulty level of English summative test because he found some problems at the second grade of SMP Negeri 13 Tangerang Selatan. First, some students commented that the test is too difficult or too easy and so forth. Also, the main problem is that many students got low score. The writer tried to investigate about this problem. He wants to know how difficult the test is.
Based on the description given previously, the writer would like to perform items analysis toward the English Summative Test items for the second grade of SMP Negeri 13 Tangerang Selatan. The writer did the research under the title “AN ANALYSIS ON THE DIFFICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH SCHOOL AT ODD SEMESTER 2010/2011 (A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)”. B. Limitation of the Study
To make this study easier to understand, the writer limits the study as follow:
1. The research focused only on the difficulty level of English Summative Test at the odd semester 2010/2011
2. The test which is analyzed is English Summative Test for the second grade at odd semester, 2010/2011 academic year
3. The research focused only on the second grade students of SMP Negeri 13 Tangerang Selatan
C. Statement of the Problem
From the limitation of problem which has been explained above, the writer formulates the statement of the problem in this research as follow:
“Does the English Summative Test for the second grade of SMP Negeri 13 Tangerang Selatan at the odd semester 2010/2011 fulfill the criteria of a good test, in term of difficulty level?”
(17)
D. Objective of the Study
In line with the limitation of the problem, the objective of the study is to measure the quality of English Summative Test for second grade of SMP Negeri 13 Tangerang Selatan at the odd semester 2010/2011 and to know the difficulty level of each item.
E. Significance of the Study
The result of this study is expected to have some benefits in English teaching. It suggests to the test makers or classroom teachers when they find an item test which has a high or low difficulty. They could review which items that make the test too easy or too difficult and it can be followed up by rearranging the test. So, this study can give contributions or a useful input and feedback as bases for improving English Summative Test.
Besides the purpose above, the study will fulfill the writer’s final
assignment for his bachelor’s degree. Finally, other researchers who are
interested in analysis on the difficulty level can get basic information from this study to do the further research.
F. Method of the Study
The methods used in the research are descriptive analysis and quantitative. The writer took the English Summative Test paper and students’ answer sheet, then analyzed the difficulty level of each item. Quantitatively, the writer used some numerical data which is analyzed statistically. The writer also did library research by studying a number of references and literatures related to the topic of discussion to support the theoretical aspect of investigation.
(18)
6
CHAPTER II
THEORETICAL FRAMEWORK
In this chapter, the writer tries to give clear description of theoretical framework which covers definition and types of test, types of test item, characteristics of good test, definition and types of item analysis, and the importance of the item analysis.
A. Test
1. Definition of Test
In the process of evaluation, one of the method that can be used to gather data is a test. Many experts have stated some definitions of test. In his book, Educational Test and Measurement an Introduction, Anthony J.
Nitko writes “Test is a systematic procedure for observing and describing
one or more characteristics of a person with the aid of either a numerical
scale or category system.”1
Another opinion states that test is a technique or way consisting of some questions, statements, or tasks that are delivered to students in term of measuring their performance or behavior.2 Victor H. Noll also writes
1
Anthony J. Nitko, Educational Test and Measurement, an Introduction, (New York: Harcourt Brace Jovanovich, Inc., 1983), p. 6.
2
(19)
that a test usually includes the use of several certain instrument or set of instruments to determine a specific quality or trait.3
Moreover, Jum C. Nunnally states that, “A test is a standardized
situation that provides an individual with a score.”4
Based on some definitions above, it can be concluded that a test is a method or way to measure the behavior or performance of individuals and it consists of some systematic procedures for gathering data about their achievement. It is usually carried out under standardized situation in teaching and learning process.
2. Types of Test
There are many types of test used to measure students’ achievement.
However, there are four basic types of language tests: achievement tests, proficiency tests, progress tests, and aptitude tests.5
a. Achievement Test
In his book, Language Testing, Tim McNamara writes,
“Achievement tests accumulate evidence during, or at the end of, a course of study in order to see whether and where progress has been made in terms of the goals of learning. They relate to the past in that they measure what language the students have learned as a
result of teaching.”6
Furthermore, Nunnally states that, “The purpose of achievement
test is to measure progress in school up to a particular point in time. Achievement test is based on the core educational objectives shared by
the educators across the country.”7
3
Victor H. Noll, Educational Measurement, (Boston: Houghton Mifflin Company, 1965), 2nd
Ed., p. 13.
4
Jum C. Nunnally, Educational Measurement and Evaluation, (New York: McGraw-Hill, Inc., 1964), p. 6.
5
Rebecca M. Valette, Modern Language Testing, (New York: Harcourt Brace Jovanovich Inc., 1977), 2nd Ed., p. 5.
6
Tim McNamara, Language Testing, (New York: Oxford University Press, 2000), p. 6.
7
(20)
In addition, according to Rebecca M. Valette, “achievement tests are usually not built around one set of teaching materials but are designed for use with students from a variety of different schools and
programs.”8
In the writer’s opinion, achievement test is a test which is designed
to know how successful students have mastered the previous materials of a long period of course and whether they have achieved the educational objectives. So, by achievement test, it is able to compare among individual students, classes and school progress with others across the country.
According to Wilmar Tinambunan, there are four types of achievement test: placement, formative, diagnostic, and summative test.9
1. Placement Test
“Placement tests are designed to assess students’ level of
language ability so that they can be placed in the appropriate course or class. Such tests may be based on aspects of the syllabus taught at the institution concerned, or may be based on unrelated material. In some language centres students are placed according to their rank in the test results so that, for example, the students with the top eight scores might go into the top class. In other centres the students’ ability in different skills such as reading and writing may need to be identified. In such a centre a student could conceivably be placed in the top reading class, but in the bottom writing class, or some other combination. In yet other centres placement test may have the purpose of deciding whether students need any further tuition
at all.”10
Also, a quote by James Dean Brown in his book Testing in Language Programs states that the purpose of this test is to make a
8
Rebecca M. Valette, Modern Language ..., p. 5.
9
Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: Depdikbud, 1988), p. 7.
10
J, Charles Alderson, et. al., Language Test Construction and Evaluation, (Cambridge: Cambridge University Press, 1995), p. 11-12.
(21)
group of students who are in the same level of ability so teachers can focus and only concentrate on the problems or learning points suitable for that level.11
Moreover, placement tests provide information that helps to place students in the part of learning program most appropriate with their levels of ability. They are most successful in term of their use when they are constructed for particular situations.12 Most placement tests constructed by classroom teachers are pretests which function to know the readiness of students to begin the instruction and to place the students in the part of learning activity with the proper instruction.
2. Formative Test
Norman E. Gronlund writes that “formative tests are given periodically during instruction to monitor pupil learning progress
and to provide ongoing feedback to pupils and teachers.”13
It usually covers some parts of instruction, such as unit, chapter, etc.
In line with the opinion above, formative tests are carried out while the instruction is ongoing to identify learning progress students have made and to give the continuous feedback in term of strengths and weaknesses of learning activity.14 Furthermore, “the formative test is given during the course of instruction; its purpose to show which aspects of the chapter the student has mastered and
where remedial work is necessary.”15
11
James Dean Brown, Testing in Language Programs, (New Jersey: Prentice Hall Regents,
1996), p. 11.
12
Arthur Hughes, Testing for Language Teachers, (Cambridge: Cambridge University Press,
2003), 2nd Ed., p. 16-17.
13
Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan
Publishing Co., Inc., 1981), 4th Ed., p. 125.
14
Wilmar Tinambunan, Evaluation of Students..., p. 8.
15
(22)
Its result gives the information about how well students have mastered a particular material and provides them immediate feedback. With feedback, students can determine their learning errors or weaknesses then they can revise with or without teachers’ help.
Thus, in the writer’s opinion, formative test is designed to
check students progress during the instruction in mastering one particular learning point and to give students feedback directly. 3. Diagnostic Test
The result of diagnostic test is intended to show the specific weaknesses and strengths in a particular material or skill.16 It can be said that it is much comprehensive and detailed because it identifies the major causes of learning difficulties and then helps prepare a plan for remedial activity.
In his book, Testing for Language Teachers, Arthur Hughes
states that, “Diagnostic tests are used to identify learners’ strengths
and weaknesses. They are intended primarily to ascertain what
learning still needs to take place.”17In addition, “a diagnostic test is
designed to determine the degree to which the specific instructional
objectives of the course have been accomplished.”18
Therefore, by using diagnostic tests, teacher knows what students have mastered and what areas in which a student needs further help. It is made while students are learning the language. So, diagnostic tests are typically delivered at the beginning or in the middle of a language course.
16
Robert Lado, Language Testing, The Construction and Use of Foreign Language Tests, (London: Longman Group Limited, 1961), p. 369.
17
Arthur Hughes, Testing for Language ..., p. 15.
18
(23)
4. Summative Test
According to Wilmar Tinambunan, “the summative test is
intended to show the standard which the students have now reached in relation to other students at the same stage. It typically comes at
the end of a course or unit of instruction.” 19
To support the opinion above, summative assessment methods are made to determine what a students has accomplished at the beginning or the end of a language course, then teachers can give a final mark to students.20 Moreover, Rebecca M. Valette states that,
“the summative test is usually given at the end of a marking period
and measures the “sum” total of the material covered.”21
In conclusion, the summative test is a test that is usually administered at the end of a language course, a semester or an academic year to know how successful students has achieved a wide range of material within a certain period. On this type of a test, students are usually ranked and graded.
b. Proficiency Test
James Dean Brown writes, “a proficiency test assesses the general
knowledge or skills commonly required or prerequisite to entry into a group of similar institutions. Such tests are very general in nature and cannot be related to the goals and objectives of any
particular language program.”22
Furthermore, Arthur Hughes states:
“Proficiency tests are designed to measure people’s ability in a
language regardless of any training they may have had in that language. The content of a proficiency tests, therefore, is not based on the content or objectives of language courses that people taking the test may have followed. Rather, it is based on a specification of
19
Wilmar Tinambunan, Evaluation of Students ..., p. 9.
20
Julie Cotton, The Complete Guide to Learning and Assessment, (New Delhi: Crest Publishing House, 2004), p. 24.
21
Rebecca M. Valette, Modern Language ..., p. 11.
22
(24)
what candidates have to be able to do in the language in order to be
considered proficient.”23
To sum up, proficiency tests measure someone’s general ability in
a language and they are not related to some previous courses of instruction. The proficiency tests usually consist of standardized multiple-choice items on grammar, vocabulary, reading comprehension, aural comprehension, and sometimes on writing. c. Progress Test
Based on the book Language Test Construction and Evaluation,
“progress tests are given at various stages throughout a language
course to see what the students have learnt.”24
Meanwhile, another opinion states that, “the progress test measures
how much the student has learned in a specific course of instruction. The tests that the classroom teacher prepares for administration at the end of a unit or end of a semester are progress tests.”25
Thus, progress test is used to check students progress in learning one particular lesson and teacher can administer it at anytime of language course.
d. Aptitude Test
According to Robert Lado, “aptitude tests are designed to predict
the degree of success that individual students will have in studying a
foreign language.”26
In addition, an aptitude test is typically used to make a prediction about how successful students are in the learning activity they will have.27
The opinions above are supported by Howard B. Lyman in his book, Test Scores and What They Mean, he writes that, “All aptitude
23
Arthur Hughes, Testing for Language ..., p. 9.
24
J, Charles Alderson, et. al., Language Test ..., p. 12.
25
Rebecca M. Valette, Modern Language ..., p. 5.
26
Robert Lado, Language Testing, ..., p. 370.
27
(25)
tests imply prediction. They give us a basis for predicting future level
of performance.”28
Because it functions to measure the potential capacity of an individual, aptitude test can be used to decide how long students will master a foreign language sufficiently. Also, it is often used in selecting individuals for language training, for jobs, for scholarships, and for many other purposes.
B. Categories of Good Test
Test as an instrument of obtaining information should have a good quality. The quality of a test will influence the result of the test itself. Once the test has a good quality, the right information will be gained and used to make accurate decision to the students achievement.
According to David P. Harris, “all good tests possess three qualities:
validity, reliability, and practicality.”29
1. Validity
In the book, Educating Pupil Growth Principles of Tests and Measurement, “validity is often defined as the degree to which a measuring instrument actually serves the purposes for which it is
intended.”30 Also, Norman E. Gronlund writes that, “validity refers to the
extent to which the results of an evaluating procedure serve the particular
uses for which they are intended.”31
So, validity of a test means that the test really measures what it is supposed to measure. According to some experts, three types of validity have been identified and are commonly used in educational measurement.
28
Howard, B. Lyman, Test Scores and What They Mean, (Boston: Allyn and Bacon, 1998),
6th Ed., p. 22.
29
David P. Harris, Testing English as a Second Language, (New York: McGraw-Hill Inc., 1969). p. 13.
30
J. Stanley Ahmann and Marvin D. Glock, EducatingPupil Growth Principles of Tests and
Measurement, (Boston: Allyn and Bacon, 1967), 3rd Ed., p. 285.
31
(26)
a. Content Validity
A test can be said to have content validity if it is built with a representative sample of the language skills, structures, etc. which it is meant to be concerned.32 In line with that, Anthony J. Nitko writes
that, “content validity is the extent the items on a test are representative
of the domain or universe that they are supposed to represent.”33
Thus, the degree of content validity in a test relates to how well the the test measures the content of subject matter that students studied before. Therefore, it is important to make sure that the test covers all the areas of material that are supposed to be assessed. For example, a grammar test should be made up of items relating to the knowledge of grammar.
b. Construct Validity
This type of validity relates to any underlying ability that is
formulated in a theory of language ability. Construct validity is “the
extent that a test measures the trait, attribute, or mental process it should measure, and whether descriptions of persons in terms of such
constructs can follow using the scores from that test.”34
Moreover, Arthur Hughes writes that, “it is a matter of empirical research to establish whether or not such a distinct ability exsists, can
be measured, and is indeed measured in that test.”35
In other words, it can be said that a test has construct validity if it is able to measure certain specific characteristics agreeable with a theory of language and behavior in learning.
c. Criterion-Related Validity
Criterion-related validity relates to the extent how agreeable the results of the test with the results come from the another independent
32
Arthur Hughes, Testing for Language ..., p. 26.
33
Anthony J. Nitko, Educational Test ..., p. 413
34
Anthony J. Nitko, Educational Test ..., p. 413.
35
(27)
and trustworthy assessment of student’s competence.36
In addition, in his book, Educational Tests and Measurement, An Introduction,
Anthony J. Nitko states that, “criterion-related validity questions
concern the extent to which scores on a test permit inferences about
examinees’ likely standing on another measure called a criterion.”37
This type of validity can be divided into two parts; namely, concurrent validity and predictive validity.
1. Concurrent Validity
According to J. Stanley Ahmann and Marvin D. Glock, this
validity is “designed to estimate present status with respect to a
characteristic different from the test.”38 In other words, it tries to
determine a student’s present standing indirectly.
Concurrent validation is carried out by comparing an individual’s test scores with his other assessment taken at about the same time.
2. Predictive Validity
Predictive validity is intended to predict how well someone
will perform in the future. It is supported by a quote, “predictive
validity concerns the degree to which a test can predict candidates’
future performance.”39
To do this validition, the earlier test scores from individual students are correlated with grades made at the end of the first semester.
2. Reliability
Consistent measurement is a necessary condition for high quality educational testing. This consistency of a test is called as reliability.
36
Arthur Hughes, Testing for Language ..., p. 27.
37
Anthony J. Nitko, Educational Test ..., p. 422.
38
J. Stanley Ahmann and Marvin D. Glock, EducatingPupil ..., p. 288.
39
(28)
“Reliability refers to the consistency of measurement – that is, to how consistent test scores or other evaluation results are from one measurement
to another.”40
According to Desmond Allison, “the reliability of a test concerns the
accuracy and trustworthiness of its results. Reliable test results will
accurately reflect each student’s understanding of whatever is being
tested.”41
To sum up, a test is reliable if it consistently produces the same, or nearly the same result or rank for the same individual taking the test several times on the different occassion.
3. Practicality
The last quality that a good test should have is practicality or usability. In selecting a test and other instruments, practical considerations cannot be neglected. These are some factors relevant to the practicality when selecting tests:42
a. Ease of Administration
“The administrability of evaluation devices refers to the ease and accuracy with which the directions to pupils and evaluator can be
followed.”43
In addition, ease of administration involves the simple and clear directions, the subtests in minimum numbers and the easy timing. b. Time Required for Administration
The test’s length is directly related to the reliability of a test, so the availability of enough time should be taken. “A safe procedure is to
40
Norman E. Gronlund, Measurement and Evaluation ..., p. 93.
41
Desmond Allison, Language Testing ..., p. 85.
42
Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation in Teaching, (New
York: Macmillan Publishing Company, 1990), 6th Ed., p. 102-103.
43
H. H. Remmers, et. al., A Practical Introduction to Measurement and Evaluation, (New York: Harper & Brothers, 1960), p. 126.
(29)
allot as much time as is necessary to obtain valid and reliable
results.”44
c. Ease of Interpretation and Application
If the test is interpreted correctly and applied effectively, teacher can make accurate educational decisions about students performance. d. Availability of Equivalent or Comparable Forms
Equivalent test measure the same aspect and is alike in content, level of difficulty, and other characteristics. It is useful if teacher wants to remove the factor of memory when retesting students on the same domain. Comporable forms are especially useful in measuring the progress of the basic skills.
e. Cost of Testing
The factor of the cost is actually not really important in selecting test. Testing is relatively inexpensive. However, the point is the test should be as economical as possible in cost.
C. Types of Test Item
An item is the basic unit of language testing. According to James Dean
Brown, the definition of the item “is the smallest unit that produces distinctive
and meaningful information on a test or rating scale.”45
The items used in clasroom tests are commonly divided into two broad categories: (1) the objective item, and (2) the essay test.
1. Objective Test
In constructing an achievement test, the test maker may choose from a variety of item types. One of them is referred to as objective item. This
kind of item types can be scored objectively. Furthermore, “equally
competent scorers can score them independently and obtain the same
44
Wilmar Tinambunan, Evaluation of Students ..., p. 23.
45
(30)
results.”46 In addition, Rebecca M. Valette defines objective test as “any
item for which there is a single predictable correct answer.”47
Thus, when scoring this test, any subjective judgement from the scorer is pushed aside because every item in that test has only one absolutely right answer. So, although the test is scored in several different times by one scorer or another, it will obtain the same result.
The objective item can be classified into two types, which are selection-type test item and supply-type test item.
a. Selection-Type Test Item 1. Multiple Choice
According to Anthony J. Nitko, “a multiple choice item
consists of one or more introductory sentences followed by a list of two or more suggested responses from which the examinee
chooses one as the correct answer.”48
The other responses which
are as incorrect answers function to distract students’ attention
away from the correct answer in case they are uncertain of the answer.
In line with that quote, “multiple choice items are made up of
an item stem, or the main part of the item at the top, a correct answer, which is obviously the choice that will be counted correct, and the distractors, which are those choices that will be counted as
incorrect.”49
For example:
Budi has been here ____________ half an hour.
a. during c. while
b. for d. since
46
Norman E. Gronlund, Constructing Achievement Test, (New Jersey: Prentice-Hall, Inc., 1982), 3rd Ed., p. 36.
47
Rebecca M. Valette, Modern Language ..., p. 8
48
Anthony J. Nitko, Educational Test ..., p. 190.
49
(31)
The multiple choice item is commonly recognized as the most applicable and useful type of objective test item. It can be used to measure both knowledge outcomes and many types of skills. In addition, it can measure a variety of learning outcomes from simple to complex material.
The multiplce choice item is included in discrete point test. Discrete point test takes language skill apart. Oller states that,
“discrete items attempt to test knowledge of language one bit at a
time.”50
It means that language knowledge can be divided into a number or components, such as grammar, vocabulary spelling, punctuation, pronunciation, intonation, and stress. This test only measures the knowledge of language in one particular component.
Actually, it is not too difficult for test maker or teacher to construct multiple choice item test. However, there some suggestions that they shoul consider in constructing this type of test items:51
a. The stem of the item should be meaningful by itself and should show a specific problem.
b. The item stem should include as much of the item as possible and should be free of irrelevant material.
c. A negatively stated item stem can be used only when significant outcomes need it.
d. All of the alternatives should be grammatically consistent with the stem.
e. An item should contain only one clearly correct answer. f. Items used to measure understanding should contain some
novelty, but beware too much. g. All distracters should be plausible.
h. Verbal associations between the stem and the correct answer should be avoided.
i. The relative length of the alternatives should not provide a clue to the answer.
j. The correct answer should appear in each of the alternative positions and in equal number but in random order.
50
John W. Oller, Language Tests ..., p. 37.
51
(32)
k. The special alternatives such as “none of the above” or “all
of the above” can be used sparingly.
l. Do not use multiple choice item when other item types are more appropriate.
Although it can be said as the most applicable and useful type of test item, multiple choice item has some limitations, such as:52
a. The technique tests only recognition knowledge. A multiple
choice item gives a quite inaccurate result of students’
ability in productive and receptive skills.
b. Guessing may have a considerable but unknownable effect
on test scores. We never know what part of any individual’s
score comes through guessing. So, we cannot identify the
truly students’ competence or ability.
c. The technique severely restricts what can be tested. The basic problem here is that it requires distractors, and they are not always available.
d. It is very difficult to write successful items. The common faults fall on some areas such as more than one correct answer, no correct answer, the obvious clues in the options, ineffective distractors.
e. Backwash may be harmful. Practice at multiple choice items will not usually be the best way for students to improve their command of a language.
f. Cheating may be facilitated. The fact that how to response on a multiple choice item is so simple makes students easy to communicate each other non-verbally.
Beside its limitations, multiple choice item also has some advantages. Wilmar Tinambunan writes the advantages of multiple choice item as follow:53
a. The multiple choice item can be used for subject matter content in any different levels of behaviour, such as ability to reason, discriminate, interpret, analyze, infer, and solve problems. b. It has less chance for students to guess the right answer than the
true-false item does because it is followed by four or five alternatives.
52
Arthur Hughes, Testing for Language ..., p. 76-78.
53
(33)
c. One advantage of the multiple choice item over the true-false item is that students also know what is correct rather than only know that a statement is incorrect.
In the writer’s opinion, multiple choice item includes at least
three components, which are the stem, the distractors, and the correct answer. The stem can be the direct question or incomplete statement which students have to response. The distractors are presented to distract the students who do not study well for choosing the answer correctly. This type especially useful for measuring learning outcomes that require the understanding, application, or interpretation of factual information.
2. True-False
In the book, Criterion-Referenced Language Testing, true-false
item “requires student to respond to the language by selecting one
of two choices, for instance, between and true and false or between
correct and incorrect.”54
In line with that opinion, Norman gives the definition of true-false item as follow:
“True-false item is simply a declarative statement that the student must judge as true or false. There are modifications of
this basic form in which the student must respond “yes” or “no,” “agree” or “disagree,” “right” or “wrong,” “fact” or
“opinion,” and the like. Such variations are usually given the
more general name of alternative-response items. In any event this item type is characterized by the fact that only two
responses are possible.”55
For example:
Direction: Read each of the following statements, if the statement is true grammatically, circle the T. If the statement is false gramatically, circle the F!
54
James Dean Brown and Thom Hudson, Criterion-Referenced Language Testing,
(Cambridge: Cambridge University Press, 2002), p. 66.
55
(34)
T F 1. Toni usually help her mother in cooking. T F 2. Every student must bring their own book. T F 3. If I had much money, I would buy a house. T F 4. She is smarter in our classroom.
T F 5. The men are gathered in a conference room.
The most common use of the true-false item is to measure the ability to identify the correctness of statement of fact, definition of terms, principles, etc and to distinguish fact from opinion.56 It is used in measuring such relatively simple learning outcomes so a single declarative statement is provided with one of several methods of responding.
Therefore, to make it more effective in measuring students’
understanding, there are some rules which should be noticed for constructing true-false items:57
a. Include only one central, significant idea in each statement b. Word the statement so precisely that it can be judged true
or false unequivocally
c. Keep the statement short, and use simple language structure d. Use negative statements sparingly, and avoid double
negatives
e. Statements of opinion should be attributed to some source f. Avoid extraneous clues to the answer
Moreover, Anthony J. Nitko states that this item type has some advantages and criticisms.58 Here they are:
Advantages:
a. Certain aspects of the subject matter lend themselves to verbal prepositions that can be judged true or false
b. Such items are relatively easy to write c. They can be scored easily and objectively
d. They can cover a wide range of content with a relatively short period of testing
56
Wilmar Tinambunan, Evaluation of Students ..., p. 70.
57
Norman E. Gronlund, Constructing Achievement ..., p. 55-56
58
(35)
Criticisms:
a. They are often used only to test specific, frequently trivial, facts
b. They can be ambigiously worded
c. They can be answered correctly by blind guessing
d. They may encourage students to study and accept only oversimplified statements of truth and factual details
Thus, true-false item is the item type which contains a single written statement and then it must be decided by students whether it is true or false. It is constructed to check and measure whether a simple particular point has been comprehended or not.
3. Matching
“The matching item consists of two paralell coloumns with
each word, number, or symbol in one coloumn being matched to a word, sentence, or phrase in the other coloumn. The items in the coloumn for which a match is sought are called premises and the items in the coloumn from which the selection is made are called responses. They are useful in measuring students ability to make associations, discern relationship, make
interpretations or measure knowledge of a series of facts.”59
In other words, this item type presents students with two coloumn of information in which they have to match the correct option or response to premise. It is typically used to measure factual information or knowledge based on simple relationship. Therefore, when learning outcomes concern on the ability to identify the relationship between two things, matching item should be the most appropriate. For example:
Match the following words on the left with their synonyms on the right!
1. ( ) Receive a. Carry
2. ( ) Achieve b. Emerge
3. ( ) Bring c. Increase
59
(36)
4. ( ) Appear d. Accept
5. ( ) Improve e. Accomplish
Furthermore, James Dean Brown formulates three guidelines that teachers should apply in constructing matching items:60
a. More responses should be supplied than premises so that students cannot narrow down the choices as they go along by simply keeping track of the options that they have already used.
b. The responses should usually be shorter than the premises because most students will read a premise and then search through the options for then correct match.
c. The premises and responses should be logically related to one central theme that is obvious to the students.
Moreover, matching item has some advantages to be carried
out in testing. The first advantage is “its compat form, which
makes it possible to measure a large amount of related factual
material in a relatively short time.”61 Secondly, “the effects of
guessing is reduced since the student will have one chance out of a
number of responses available of guessing correctly.”62
At last, it has ease of construction.
4. Rearrangement
“Rearrangement items require the pupil to put into some
specified order a series of randomly presented material.”63
In the book, Measurement and Evaluation in the Schools, Louis J. Karmel states that any kind of specified order may be called for, such as chronology, order of difficulty, order of importance, length, weight, logic, and so on.64
60
James Dean Brown, Testing in Language ..., p. 57.
61
Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation ..., p. 159.
62
Wilmar Tinambunan, Evaluation of Students ..., p. 65.
63
H. H. Remmers, et. al., A Practical Introduction ..., p. 243.
64
Louis J. Karmel, Measurement and Evaluation in the Schools, (London: The Macmillan Company, 1970), p. 382.
(37)
For example:
Rearrange these following sentences into a good paragraph! 1. Suddenly, it was getting dark and he realized that he got lost 2. Once upon a time, there was a bee named Bumbee
3. Bumbee could get home and gathered with his family happily 4. One day, he felt so happy and flew alone in the forest
5. Fortunately, a butterfly appeared and she liked to help him b. Supply-Type Test Item
1. Short-Answer
According to Norman E. Gronlund in his book, Constructing Achievement Test, he states that, “the short answer (or completion )
item is the only objective item type that requires the examinee to
supply, rather than select, the answer.”65
In line with that opinion,
this item type “generally requires the students to examine a
statement or question then respond to it with a phrase or two, or a
sentence or two, in the space provided.”66
Both short answer item and completion item can be answered by a word, phrase, sentence, number, or symbol. In the short answer item, the question is presented as a direct question:
For example:
a. What is the capital city of West Java? (Bandung) b. Who invented the lightbulb? (Thomas Alfa Edison)
Whereas, the completion item requires student to supply the answer in an uncomplete statement.
For example:
a. The capital city of West Java is ... (Bandung)
b. The name of the man who invented the lightbulb is ... (Thomas Alfa Edison)
65
Norman E. Gronlund, Constructing Achievement ..., p. 57.
66
(38)
It seems obvious that short answer item or completion item observe performance from the lowest level of the cognitive domain. Moreover, it is suitable for measuring a wide variety of relatively simple learning outcomes.
To make this item type effective to measure simple learning outcomes, there are some suggestions that should be noticed in order not to make the items in a careless way:67
a. Require short, definite, clean-cut answers
b. If several correct answers (synonyms) are possible, count each one correct or change the item to restrict the correct answer
c. Decide whether spelling should be disregarded or given a separate score
d. Minimize the use of textbook expressions or stereotyped language in phrasing the questions
e. Specify the terms in which the response is to be given f. In testing for a knowledge and understanding of definitions,
it is often better to provide the term and require a definition than to provide a definition and require the term
g. Direct questions are probably preferable to incomplete declarative sentences
h. Hints concerning the correct answer, in the form of the first letter of a word, or a number indicating the number of letters in a word, should generally not be given
i. The space for the response should usually be at the right of the question
j. Allow enough space for the responses to permit legible writing
k. Arranging the answer spaces in a coloumn at the right-hand margin of the page makes scoring more convenient
Furthermore, short answer item has some advantages and disadvantages like Arthur Hughes writes in his book, Testing for Language Teachers:68
a. Advantages:
1. Guessing will (or should) contribute less to test scores 2. The technique is not restricted by the need for
distractors (compared with multiple choice)
67
H. H. Remmers, et. al., A Practical Introduction ..., p. 223-225.
68
(39)
3. Cheating is likely to be more difficult
4. Though great care must be taken, items should be easier to write
b. Disadvantages:
1. Responses may take longer and so reduce the possible number of items
2. The test taker has to produce language in order to respond
3. Scoring may be invalid or unreliable, if judgement is required
4. Scoring may take longer 2. Fill-in
This item type provides a sentence or a passage and some contents or parts are removed. Then, students are asked to fill those blank spaces. As James Dean Brown and Thom Hudson write,
“this format provides a language context of some sort and then removes part of the context and replaces it with a blank. The
student’s job is to fill in that blank.”69
For example:
1. He failed another exam, __________ he had studied very hard. 2. She does not come today. She __________ be sick.
3. Once upon a __________, there was a farmer living in a small village in England. His __________ was Jack. He was a kind and wise man. He liked to help his neighbors. Jack __________ a mill machine. People came to his place to __________ their grain. Jack served them happily. However, his wife was a very __________ woman. She often complained. She __________ angry every time Jack __________ some food to the neighbors.70
69
James Dean Brown and Thom Hudson, Criterion-Referenced ..., p. 73.
70
Ali Akhmadi and Ida Safrida, Smart Steps Grade VII, (Bandung: Ganeca Exact, 2005), p. 133.
(40)
In addition, fill-in item measures the student’s ability to produce a language, even if a small amount of language. However, to make the measurement by fill-in item result the valid data, it is prominent to tell clearly to students that only one word can be put in each blank or gap.
For more advanced, in order to use fill-in item in an efficient
way for measuring students’ performance, there are five
considerations issued by James Dean Brown that teachers should remember:71
a. Teachers should check to make sure that each item has one very concise correct answer
b. Teacher should make sure that enough context has been provided that the purpose, or intent, of the item is clear to those students who know the answer
c. All the blanks in a fill-in test should be the same length d. Teachers should also consider putting the main body of the
item before the blank in most of the items so that the students have the information necessary to answer the item once the encounter the blank
e. In situations, where the blanks may be very difficult and frustrating for the students, teachers might consider supplying a list of responses from which the students can choose in filling in the blanks
Furthermore, as one of types of test item, fill-in item has some advantages and limitations:72
Advantages:
a. It is relatively easy to construct
b. It is flexible to use from a test writer’s point of view
c. It requires a short amount of time to administer Limitations:
a. It is generally very narrowly focused on testing a single word or short phrase at most
b. It may have a number of possible answers
71
James Dean Brown, Testing in Language ..., p. 58-59.
72
(41)
2. Essay Test
According to J. Stanley Ahmann and Marvin D. Glock on the book,
Educating Pupil Growth Principles of Test and Measurements, “an
essay test item demands a response composed by the pupil, usually in one or more sentences, of a nature that no single response or pattern of responses can be listed as correct, and the accuracy and quality of which can be judged subjectively only by one skilled and informed in
the subject, customarily the classroom teacher.”73
In addition, the major characteristic of essay test is the freedom of response it provides. It means that students have to produce their own answer.74 To support the opinion above, Wilmar Tinambunan states that,
“the essay-type question requires the examinee to read the question,
formulate his response and express the response in his own words.”75
Essay question can be classified into two types, which are: a. Restricted Response Type
The student is not given a complete freedom to make his response.
“it usually limits both the content and the response. The content is
usually restricted by the scope of topic to be discussed. Limitations of response are commonly indicated in the question.”76
For example:
1. State the main differences between the objective test and the subjective test according to Norman E. Gronlund!
2. Explain two advantages and two disadvantages of using the multiple choice item in testing English as a foreign language! b. Extended Response Type
In this type, student is given the freedom completely in composing
his response. “it allows pupil to select any factual information that they
think is pertinent, to organize the answer accordance with their best
73
J. Stanley Ahmann and Marvin D. Glock, EducatingPupil ..., p. 157.
74
Norman E. Gronlund, Constructing Achievement ..., p. 71.
75
Wilmar Tinambunan, Evaluation of Students ..., p. 56.
76
(42)
judgment, and to integrate and evaluate ideas as they deem
appropriate.”77
For example:
1. Why is English so important nowadays?
2. Describe the roles of the teacher in language testing!
Moreover, building the essay item as a measurement of complex learning outcomes should be done in a proper and careful way. Here are some suggestions to construct a good essay item:78
1. Make definite provisions for preparing students for taking essay examinations
2. Make sure that questions are carefully focused 3. Structure the content and length of questions
4. Have a colleague review and critique the essay questions
5. Avoid the use of optional questions, except when one is assessing writing ability where a choice of questions is desirable
6. Restrict the use of the essay as an achievement test to those objectives for which it is best
As a method to measure the complex learning outcomes, essay item has several advantages and weaknesses.
Advantages:79
1. It measures complex learning outcomes that cannot be measured by other means
2. It emphasize on the integration and application of thinking and problem-solving skills
3. It is regarded as a device for improving writing skills
4. It has ease of construction. Most teachers can formulate several essay questions in a matter of minutes
Weaknesses:80
1. There are not many samplings of achievement because only a small number of questions can be included in essay test
77
Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation ..., p. 213.
78
Kenneth D. Hopkins, et. al., Educational and Psychlogical Measurement and Evaluation,
(Englewood Cliffs, New Jersey: Prentice Hall Inc., 1990), 7th Ed., p. 216.
79
Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation ..., p. 216.
80
(43)
2. Scoring the essay test is influenced by student’s writing ability. Poor expression and errors in punctuation, spelling, grammar usually lower their score
3. While scoring essay test, the standards can be shifted because of variations in the content of the answers from paper to paper
4. It requires much time to score the answers81
Thus, in essay item, students are asked to demonstrate their ability to select, organize, integrate and review ideas to response the question in the freedom. In addition, this item type is scored subjectively since it will presents the different results when it is scored by the different person. The people who are assigned to score the answers are typically influenced by their own judgment or opinion.
To sum up, based on the previous explanation, an essay test is used to
measure student’s comprehension of a certain knowledge and student is
asked to answer by expressing his own words effectively and organizing their own ideas, using information from his own background and knowledge.
D. Item Analysis
1. Definition of Item Analysis
Obtaining the valid data as information is very valuable to give the
clear judgment about student’s performance in evaluation activity. In case
of that, the test should have a good quality and every item functions properly. Teacher or test maker should know whether the test can be included as a good test or not by evaluating every item in that test. This activity is called as item analysis.
According to Anthony J. Nitko, “item analysis refers to the process of
collecting, summarizing, and using information about individual test
items, especially information about pupil’s response to item.”82
81
(44)
In addition, “item analysis as a whole will be defined here as the
systematic statistical evaluation of the effectiveness of individual test items. Item analysis is usually done for purposes of selecting which items will remain on future revised and improved versions of the test. Sometimes, however, item analysis is performed simply to investigate how well the items on a test are working with a particular group of students, or to study which items match the language domain of
interest.”83
Moreover, Arthur Hughes proposes the purpose of item analysis
which is “to examine the contribution that each item is making to the test.
Items that are identified as faulty or inefficient can be modified or
rejected.”84
Although item analysis is done primarily for response-choice item, it is available for teacher to use several of the techniques described with any items that are scored dichotomously (simply as correct or incorrect).85
In the writer’s opinion, item analysis is statistical evaluation to know
the quality of a test by identifying whether every item on a test works appropriately or not. It is done by collecting students’ responses to each item so that it can also be known which items are included as a good one and which items that weaken the test. It is very useful for teacher to performs item analysis since it can be a device for test improvement. 2. Kinds of Item Analysis
Item analysis usually concentrates three vital features: level of difficulty, discriminating power, and the effectiveness of each alternative.
“Thus, item analysis can tell us if an item was too difficult or too easy,
how well it discriminated between high and law scores on the test, and
whether all the alternatives functioned as intended.”86
82
Anthony J. Nitko, Educational Test ..., p. 284.
83
James Dean Brown and Thom Hudson, Criterion-Referenced ..., p. 113.
84
Arthur Hughes, Testing for Language ..., p. 225.
85
Anthony J. Nitko, Educational Test ..., p. 286.
86
(45)
a. Level of Difficulty
The first area in item analysis is level of difficulty which concerns on how easy or difficult each item is. According to Kathleen M.
Bailey, difficulty level is “an index of how easy an individual item was
for the people who took it. It is typically printed as a decimal, ranging from 0.0 to 1.0. It represents the proportion of people who got the item
right.”87
Furthermore, in the book, Language Tests at School, “difficulty level (or item facility) has to do with how easy (or difficult) an item is from the viewpoint of the group of students or examiness taking the
test of which that item is a part.”88
In writer’s opinion, level of difficulty deals with how many
percentage of students who response an item correctly and those who response incorrectly. By analyzing the difficulty level of each item, it can be inferred whether an item is included as easy, moderate or difficult item.
Level of difficulty is interpreted in the form of percentage. The larger the percentage of the correct answer, the easier the item is. Then, the fewer the students who answer correctly, the more difficult the item is.
Henceforth, a good test item should have the level of difficulty, which includes easy, moderate, and difficult level. The effective and good test should have the items that belong to moderate level. The item that is too easy or difficult potentially weaken the quality of the
test and the valid data of information about student’s achievement will
not be acquired.
In addition, level of difficulty analysis can be applied for either large group of students or the small one.
87
Kathleen M. Bailey, Learning about Language Assessment: Dilemmas, Decisions, and Directions, (New York: Heinle & Heinle Publishers, 1998), p. 132.
88
(46)
As a quote, from Lyle F. Bachman, states that, “to conduct an item analysis, we first arrange the scored test papers or answer sheets in order from the highest score to the lowest score. Next, we separate the papers into upper and lower groups, according to their total test scores. For large groups, we would choose the upper and lower 27 percent, while for smaller groups, we would typically choose the upper and lower one-third.”89
The formula used for analyzing the difficulty level of each item in large group is stated below:
In which:
TK : Index of difficulty
U : The number of students in the upper group who answer the item correctly
L : The number of students in the lower group who answer the item correctly
T : The number of students in upper and lower group90
Next, for the small group, teacher or test maker can easily evaluate
an item by using all the students’ answer sheets. Then, the formula is:
89
Lyle F. Bachman, Statistical Analyses for Language Assessment, (Cambridge: Cambridge
University Press, 2004), p. 123.
90
M. Ngalim Purwanto, Prinsip-Prinsip dan Teknik Evaluasi Pengajaran, (Bandung:
Remadja Karya, 1986), p. 153.
B
P = JS
U + L TK = T
(47)
In which:
P : Index of difficulty
B : The total number of students who got the item correct JS : The number of students who took a test91
The formula above is commonly used for multiple choice item. For the short-answer item, Zainal Arifin states as follows:92
After analyzing each item and obtaining its difficulty level, the next thing to do is finding out the difficulty level for whole items in a test. It is performed by using the following formula:
In which:
P : Difficulty level for whole items b : Difficulty level of each item
Σ : Sigma (Total)
N : Total number of test items93
91
Suharsimi Arikunto, Dasar-Dasar Evaluasi Pendidikan, (Jakarta: PT. Bumi Aksara, 2006),
p. 208.
92
Zainal Arifin, Evaluasi Pembelajaran ..., P. 135.
The total of student’s score for each item Mean =
The number of students
Mean Index of difficulty =
Maximum score of each item
Σ b P = N
(1)
Text for No. 34-36
Last week. nly parents, sister, brother and I went to the zoo. We went there for recreation. We left at 06.00 a.m and arrived there at 08.00 a.m. the zoo is about a hundred kilometers from my house.
There were a lot of people watching a giant snake. The snake was there for about a week. It was 9 rnetres long. I thought it was the biggesi snake I had ever seen. After going around and rvatching ..,arious animals, We went horne.
34. The writer went to the zoo with her . . ..
A. parents B. sister c. brother D. fanrilv
35. Whai is speciai in ihe zco'l
A. wild anirrrais. C. various animals B. a giant snake. D. animal,s attraction
36. "After going around rvatching various animal, we u,ent home.,, (paragraph 2) The underlined word means ....
A. different B. simirar c. wild D. Mean
Teks for No.37-38
2nd December 2010 To : Sherly
Jalan Melati number 3 Deai- Sherly,
Thank you very much for inviting me to your 14'h birthday party on i2 December 2010. I was really looking forward to it, but sadly I will not be able to come. I am scheduled to see the dentist. I hope everybody has a great time.
Your sincerelv-Lina
37. Why can't Lina come to Sherly,s birlhday?
A. She must see the dentist C. She is disappointed
B. She will pick up the dentist D. She has trei t+,h birthday party
38. "I am scheduled to see the dentist." The underlined word is similar to . ...
A. Forced B. Asked c. ordered D. planned
(2)
T e x t f b r N o . 3 9 - 4 0
M y c a t
Spot is a regular housc cat. Ile is an adorable cat. He has orrirlse fur with white a't1 black spots. I like to cucidle him becarlse his fur feels soft. Every nrorning I give spot milk. Spot clces trot like ricc. so I give hinl cat food'
Spot is an active ar-rirnal. He likcs to run around the hor-rse. He likes to chase everyone ln ny house. Whel he fecls tireci or sleepy, Spot usually sleeps on thc sofa in the living roottt or sometimes under the table-'
3 9 .
Which staternent rs not true according to A. Spot dislikes rice C. B. Spot has orange spots D40. "He likes to chase everyone ln Iny
housc."'fhe synoliyrn cf the underlined word
A. Trick B. Catch
II. !]SSAY
41 . Arrange the jumble words into a lrr - Asia is largest lake
-the text'?
Spot likes to run around the house
Spot always sleeps otl the sofa in the living room
42. Arrange the following sentence to make a good paragraph' o My lamill and i rvettt to Park
o Yesterday was a holidaY
r ltt the afternoon, we wettt home o We played around and had lunch there
o It rvas full of fun
1 s . . . . C. Hunt
good sentence!
southeast - lake - Toba - the
D. Cheat
its body is as big as a 43 This anirnal eats grass or leaves. It
corv. It is smaller than an elephant-T h i s a n i m a l i s . . . .
has trryo strong horns on his head. It usually lives in a group.
44. Change the verbs in the brackets into right fot'rl!
On Sunday i (go) ...to the village to see my grandma. I (leave):" at eight in the momlng' (waif)... for the bus at the bus stop. Five minutes later a bus (come)....
45. This r1ammal is a sea animal that looks like a large fish with a pointed mouth. It is very intelligent and friendly towards human'
T h is a n i r n a l i s . . ..
(3)
No
: Istimewa
Hal
: Pengajuan
Judul Skripsi
Lampiran : I (satu) berkas
Kepada Yth,
Ketua Jurusan Pendidikan Bahasa Inggris
FITK UIN Jakarta
Di
Tempat
Assalamu'alaikum
Wr. Wb.
Saya yang bertanda
tangan di bawah ini:
Tangerang,
30 Maret 2011
(Andrian Dwi Prayoga)
NIM: 107014000882
Nama
NIM
Jurusan
Fakultas
Andrian Ilwi Prayoga
107014000882
Pendidikan
Bahasa
Inggris
Ilmu Tarbiyah dan Keguruan
Bermaksud untuk mengajukan judul skripsi sebagai salah satu syarat untuk menyelesaikan
progr€rm
S-1 (strata l) UIN Syarif Hidayatullah
Jakarta.
Adapun
judul yang diajukan adalah:
..AN ANALYSIS ON TIIE DIFFICULTY LEYEL OF ANGLISH STJMMATIVE TEST
FOR SECOND GRADE OF JT]FilOR HIGH SCHOOL AT ODD SEMESTER 2010/2011"
(A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)
Bersama ini saya lampirkan satu berkas proposal yang terdiri dari:
1. Abstract
2. Outline
3. Temporary
references
Demikian surat pengajuan
ini disampaikan.
Atas pertimbanganrlya,
saya sampaikan
terima kasih.
Wassalamu'alaikum
Wr. Wb.
Pengaju,
({-r'._-
4tli
(4)
KEMENTERIAN
AGAMA
UIN JAKARTA
FITK
Jl. lr. H. Juanda No 95 Ciputat 15412 tndonesia
FORM (FR)
No.Dokumen :--F|TKfRAKDrOS1 Tgl. Terbit : t Hrtaret ZO1O
SURAT BIMBINGAN
SI.(RIPSI
Nomor
: Un.0
t/F.
t/r(M.}r.3/q.!F.t20r
l
L a m p . :
-Hal : Bimbingan Skripsi Kepada yth.
Dr. H. Muhammad Farkhan. M.pd Pembimbing Skripsi
Fakultas Ilmu Tarbiyah dan Keguruan UIN Syarif Hidayatullah
Jakarta.
Iakartq 04 April2011
Nama NIM Jurusan Semester Judul Skripsi
AN ANALYSIS ON THE DIF'FICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH SCHOOL
AT ODD SEMESTER 2OIO/2071
(A case Stuciy at Second Grade of sMp Negeri 13 Tangerang Selatan)
Judul tersebut telah disetujui oleh Jurusan yang bersangkutan pada tanggat 3 1 Maret 201 I , abstraksi/orztline tetlampir. Saudara dapai melakukan-perubahan redilsional pada judul tersebut. Apabila perubahan substansial dianggap perlu, mohon pembimbing menghubungi Jurusan terlebih dahulu.
B.imbingan skripsi ini diharapkan selesai dalam waktu 6 (enam) bulan, dan dapat diperpanjang selama 6 (enam) buian berikutnya tanpa surat perpanjangan.
Atas perhatian dan kerja sama Saudara, kami ucapkan terima kasih. Wass alamu' alaikum wr.wb.
lnggris As s alamu' alaikum wr.w b.
Dengan ini diharapkan kesediaan Saudara untuk menjadi pembimbing I/II (materi/teknis) penulisan skripsi mahasiswa:
Andrian Dwi Prayoga 1 070 I 4000882
Pendidikan Bahasa Inggris VIiI
Tembusan: l. Dekan FITK
2. Mahasiswa ybs.
ki, M.Pd 1
r 2 t 2 199103
(5)
DEPARTEMEN
AGAMA
UIN JAKARTA
FITK
Jl. lr. H. Juanda No 95 Ciputat 15412 lndonesia
FORM (FR)
No. Dokumen : FITK-FR-AKD-082 Tgl. Terbit : 1 Maret 2010
No. Revisi: : 01
H a l 1t1
SURAT PERMOHONAN
IZIN PENELITIAN
Nomor
: Un.01/F.
1 /KM.01
.3/$s.lq?201
1
Lamp. : Outline/Proposal
Hal :Permohonan
lzin Penelitian
Kepada
Yth.
Kepala
SMP Negeri
13
Tangerang
Selatan
DiTempat,
A,ssal
am u' alai
ku m wr.wb.
Dengan
hormat
kami sampaikan
bahwa,
Jakarta,
25 April2011
:Andrian
Dwi Prayoga
: 107014000882
: Pendidikan
Bahasa
Inggris
S e m e s t e r : V l l l ( D e l a p a n )
Judul
Skripsi :
.,AN
ANALYSIS
ON THE DIFFICULTY
LEVEL
OF ENGLISH
SUMMATIVE
TEST
FOR
SECOND
GRADE
OF JUNIOR
HIGH
SCHOOL
AT ODD
SEMESTER
201012011"
(A Gase
Study at the Second
Grade
of SMP Negeri
13 Tangerang
Selatan)
adalah
benar
mahasiswa/i
Fakultas
llmu Tarbiyah
dan Keguruan
UIN Jakarta
yang
sedang menyusun skripsi, dan akan mengadakan
penelitian (riset) di
instansiisekolah/madrasah
yang
Saudara
pimpin.
Untuk itu kanri mohcn Saudara dapat mengizinkan
mahasiswa
tersebut
melaksanakan
penelitian
dimaksud.
Atas perhatian
dan kerja
sama
Saudara,
kami
ucapkan
terima
kasih.
Wassal
am
u' al ai kum wr.wb.
a.n. Dekan
Kajg,r
ikan Bahasa Inggris NamaN I M Jurusan
Tembusan: 1. Dekan FITK
2. Pembantu Dekan Bidang Akademik 3. Mahasiswa yang bersangkutan
' !
g Drs.
-'NjP.
(6)
PE,MERINTAH KOTA TANGERANG SELATAN
\
DIN^A.S FENDIDIKAN
SMP
NEGERI
T3
KOTA
T'ANGERANG
SH'T,ATAN
Jl. Beruang II Feladen Pd. Ranji ciputat Timur Tangerang selatan t54t2,Telp'/Fax" 021'7354472
Website.www.smpnSciputat.cornE.rnailsmpnl3_tangsel@yahoo""91-SURAT KETtrBA}{GAN
No.: 423.4 I 965 ISMPN13TANGSEL 12011
Yang bertitnda tangan di bawali ini:
Rohman, S.Pd.
1 9 5 8 0 8 1 1 1 9 8 0 0 3 1 0 1 2 Pembina/IV a
Kepala SMP Negeri 13 Tangerang Selatirn Nama
NIP
Pangkat/Golongan Jabatan
Menerangkan bahwa" Nama
NIM Jurusan Ftrkultas
Semester
Andriari Dwi PraYoga 107014000882
Pendidikan Bahasa Inggris
Ilmu Tarbiyah dan Keguilan, UIN Jakarta VIII (Delapan)
Judul Penelitian :
uANANALTslsoNTHEDIFFICULTYLEryL,FENGLISHSUMlvaTIw
TEST FOR SECOND GRADE OF XUNIOR IIIGII SCHOOL AT ODD SEMESTER 2010/2011
(A CaseStudy ut tlte SeeandGratle of SMP Negeri 13 Tangerang Sclonn)"
Nama tersebut di atas b,enar telah melakukan penelitian di SMF Negeri 13 Tangerang Selatan padatanggalz3 - 30 Mei 2011'