An analysis on the difficulty level of english summative test for second grade of junior high schoolat odd semester 2010/2011

(1)

(A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)

A “Skripsi”

Presented to the Faculty of Tarbiya and Teachers’ Training

in a Partial Fulfillment of the Requirements for the Degree of S.Pd. (Bachelor of Arts) in English Language Education

Written By:

Andrian Dwi Prayoga 107014000882

ENGLISH EDUCATION DEPARTMENT

FACULTY OF TARBIYA AND TEACHERS’ TRAINING

“SYARIF HIDAYATULLAH” STATE ISLAMIC UNIVERSITY JAKARTA

(2)

(A case study at the second Grade of sMp Negeri 13 Tangerang selatan) A "Skripsi"

Presented _{to the Faculty of Tarbiyah and Teachers' Training in partial} Fulfillment of the Requirements

For the Degree of s. Pd. (Bachelor of Arts) in English Language Education

ANDRIAN DWI PRAYOGA NIM: 107014000882

Approved by: Advisor,

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS' TRAINII\G

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

20ll

H. Nluh^mrfr-ad Farkha NIP. 19571005 198703 1 003

(3)

DIFICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH SCHOOL AT ODD SEMESTER 2O1Ol2011 (A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)", written by Andrian Dwi Prayoga, student's registration number 107014000882, and was examined by the committee on December l5th, 2011, and was declared to have passed and, therefore fulfilled one of the requirements for the academic title of 'S. Pd.' (Bachelor of Arts) in English Language Education at Department of English Education.

Jakarta; December 15tn 2011, EXAMINATION COMMITTEE

Date Signature

CHAIRMAN : Drs. Syauki. M.Pd.

NIP. 19641212 t99t03 1002

SECRETARY : Neneng Sunengsih. M.Pd. NIP. 19730625 199903 2 001

EXAMINERS : 1. Drs. A.M. Zaenuri. M.Pd. NrP. 19530304 197903 1 001

W

&

[),

,gLe"'LUb

-t-\5/ h

L_!&_---)

4,/\'L

(/'r__--J

t1/ 't 2. Nida Husna. M.Pd." M.A.TESOL _t-gj

NrP. 1972070s 200312 2 002

Acknowled gedby:7

Dean of Tarbiya and Teachers' Training Faculty

Nurlena Rifa'i" M.A.. Ph.D. NIP. 19591020 198603 2001

(4)

Saya yang bertanda tan N a m a

Tempat/Tgl.Lahir NIM

Jurusan / Prodi Judul Skripsi

AN ANALYSIS ON THE DIFFICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH

SCHOOL AT ODD SEMESTER 2O1Ol2011

(A case Study at the Second Grade of SMp Negeri l3 Tangerang Selatan) Dosen Pembimbing

dengan _{ini menyatakan}_{bahwa skripsi yang saya}_{buat benar-benar}_{hasil karya sendiri cjan} saya bertanggung jawab secara akademis _{atas apa yang saya tulis.}

gan di bawah ini, Andrian DwiPrayoga Jakarta, 2 Januari 1990

I 0 7 0 1 4 0 0 0 8 8 2

(5)

Semester 2010/2011 (A Case Study at the Second Grade of SMPN 13 South Tangerang), Skripsi, English Education Department, Faculty of

Tarbiya and Teachers’ Training, Syarif Hidayatullah State Islamic

University Jakarta.

Key words: Item Difficulty Level, Summative Test

This study is purposed to measure the difficulty level of the English summative test items, tested for the second grade of SMPN 13 South Tangerang at odd semester academic year 2010/2011. Through this study, it can be known which one of the test items is too easy, moderate, and difficult.

This study is included in quantitative research because the researcher uses some numerical data which are analyzed statistically. Also, this study is categorized as descriptive analysis because it is intended to describe the objective condition about the difficulty level of the English summative test for the second grade of SMPN 13 South Tangerang at odd semester academic year 2010/2011.

The findings of this study are that moderate items have highest percentage with 66,7% followed by difficult items with 20% and easy items with 13,3%. Overall, the difficulty level of the test is in moderate level with 0.50. Therefore, this test has a good difficulty level.

The result of this item analysis research can be used by the teachers to revise the test items categorized as either easy or difficult items. It also can give the information about which material or problem the teachers should focus more on in the classroom in order to make the students ready for the next exam.

(6)

Semester 2010/2011 (A Case Study at the Second Grade of SMPN 13 South Tangerang), Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif Hidayatullah Jakarta.

Kata kunci: Tingkat Kesulitan Butir Soal, Tes Sumatif

Penelitian ini bertujuan untuk mengukur tingkat kesulitan butir-butir soal dari tes sumatif bahasa Inggris yang diujikan untuk kelas dua SMPN 13 Tangerang Selatan pada semester ganjil tahun ajaran 2010/2011. Dengan penelitian ini, dapat diketahui butir soal mana saja yang terlalu mudah, sedang dan sulit.

Penelitian ini termasuk dalam penelitian kuantitatif karena peneliti menggunakan beberapa data numerik yang dianalisis secara statistik. Penelitian ini juga dikategorikan sebagai analisis deskriptif karena penelitian ini menggambarkan kondisi objektif mengenai tingkat kesulitan tes sumatif bahasa Inggris untuk kelas dua SMPN 13 Tangerang Selatan pada semester ganjil tahun ajaran 2010/2011.

Hasil dari penelitian ini adalah bahwa soal yang sedang memiliki persentase yang paling tinggi dengan 66,7% diikuti oleh soal sulit sebesar 20% dan soal mudah sebesar 13,3%. Secara keseluruhan, tingkat kesukaran soal ini berada pada tingkat sedang dengan 0.50. Oleh karena itu, tes ini memiliki tingkat kesukaran soal yang baik.

Hasil dari penelitian analisa butir soal ini dapat digunakan oleh guru untuk memperbaiki butir soal yang termasuk kategori soal mudah dan soal sulit. Penelitian ini juga dapat memberikan informasi mengenai materi atau masalah mana saja yang perlu mendapat perhatian lebih dari guru dalam proses belajar mengajar untuk mempersiapkan siswa dalam menghadapi ujian selanjutnya.

(7)

iii

and Blessing to the writer, so that this “Skripsi” can be finished completely. Peace

and Salution be upon our prophet Muhammad, his families, companions, and his followers.

The writer would like to express his gratitude to Mr. Dr. H. Muhammad

Farkhan, M.Pd. as the writer’s advisor who had kindly spent his time to give his

valuable advice, guidance, corrections, and suggestions in composing this

“Skripsi.”

Also, on this occasion, the writer would like to express his greatest appreciation, honor, gratitude and love to his beloved mother, Mrs. Tri Hastuti, S.Pd., who has been a great motivator in every condition, and also to his father Mr. Juraid Umar, M.Pd., who has given him many inspirations. He thanks to them for their pray, guidance, patience, and encouragement to motivate the writer to finish his study.

The writer would like to express his highest appreciation and gratitude to all lecturers of English Education Department, for teaching the precious knowledge, sharing the values of life and giving the unforgettable study experinces.

The writer dedicates many thanks to Mr. Rohman, S.Pd. as the Headmaster of

“SMPN” 13 South Tangerang, who had given the permission to the writer to do

the research there. Also, his gratitude is sent to Ms. Dahlia Muflikhati, S.Pd. as

one of English teachers in “SMPN” 13 South Tangerang who had given the writer

great contribution and corporation while he was doing this research.

His gratitude also goes to Mr. Drs. Syauki, M.Pd. as the Head of English Education Department, Ms. Neneng Sunengsih, S.Pd. as the Secretary of English Education Department. Also, his thanks is given to the staffs of English Education Department, specially for Ms. Aida Ainul Wardah, S.Pd. who always gives excellent service and contribution to the writer.

The writer would like to express his thanks and love to all his beloved friends, especially for Ayu Lestari, Marinta, Eva Nur, Maya K., Syifa F., Fera P., Wilda

(8)

while studying together.

Finally, the writer realizes that this “Skripsi” is still far from being perfect.

Constructive criticism and suggestion would be welcomed to make it better.

Jakarta, November 2011

(9)

ABSTRAK... ii

ACKNOWLEDGEMENT... iii

TABLE OF CONTENTS... v

LIST OF TABLES... viii

CHAPTER I : INTRODUCTION... 1

A. Background of the Study... 1

B. Limitation of the Study... 4

C. Statement of the Problem... 4

D. Objective of the Study... 5

E. Significance of the Study... 5

F. Method of the Study... 5

CHAPTER II : THEORETICAL FRAMEWORK... 6

A. Test... 6

1. Definition of Test... 6

2. Types of Test... 7

a. Achievement Test... 7

1. Placement Test... 8

2. Formative Test... 9

3. Diagnostic Test... 10

4. Summative Test... 11

b. Proficiency Test... 11

c. Progress Test... 12

d. Aptitude Test... 12

B. Categories of Good Test... 13

1. Validity... 13

a. Content Validity... 14

(10)

2. Reliability... 15

3. Practicality... 16

C. Types of Test Item... 17

1. Objective Test... 17

a. Selection-Type Test Item... 18

1. Multiple Choice... 18

2. True-False... 21

3. Matching... 23

4. Rearrangement... 24

b. Supply-Type Test Item... 25

1. Short-Answer... 25

2. Fill-in... 27

2. Essay Test... 29

D. Item Analysis... 31

1. Definition of Item Analysis... 31

2. Kinds of Item Analysis... 32

a. Level of Difficulty... 33

b. Discriminating Power... 36

c. The Effectiveness of Distractors... 38

E. The Importance of Item Analysis... 38

CHAPTER III : THE IMPLEMENTATION OF THE RESEARCH... 41

A. Research Methodology... 41

1. Purpose of the Study... 41

2. Place and Time of the Study... 41

3. Population and Sample... 41

4. Method of the Study... 42

(11)

vii

1. Data Description... 44

2. Data Analysis... 47

3. Data Interpretation... 52

CHAPTER IV : CONCLUSION AND SUGGESTIONS... 53

A. Conclusion... 53

B. Suggestions... 53

BIBLIOGRAPHY... ix APPENDIXES

(12)

viii

Table 2. The Group Position Based on the Test Result... 45 Table 3. Format of Item Analysis of the English Summative Test... 48 Table 4. Classification of Items Based on the Proportion of Difficulty Leve...49 Appendix

Table 5. Students’ Answer in the Upper Group (Multiple-Choice Items)

Table 6. Students’ Answer in the Lower Group (Multiple-Choice Items)

(13)

CHAPTER I INTRODUCTION

This chapter discusses and presents background of the study, limitation of the problem, statement of the problem, objective of the study, significance of the study, and method of the study.

A. Background of the Study

Evaluation plays an important role in every stage of education. It is integrated in the school program so it contributes directly to the teaching and learning process. According to Norman E. Gronlund, “Carefully collected evaluation data help teachers understand the learners, plan learning experiences for them, and determine the extent to which the instructional objectives are being achieved.”1

Evaluation refers to the process of making conclusion from a study of data gathered to describe value judgments about student’s performance. Lyle F. Bachman quotes that, “evaluation can be defined as the systematic gathering of information for the purpose of making decisions.”2 In summary, evaluation takes the very important role because it is a must for teachers to always concern with the quality of their instructional process and whether students have reached the instructional goals which have been stated before.

Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan

Publishing Co., Inc., 1981), 4th Ed., p. 3.

Lyle F. Bachman, Fundamental Considerations in Language Testing, (Oxfrod: Oxford

(14)

There are many ways for collecting data as information in the process of evaluation. One of them is by using a test. A test is a set of question, each of which has a correct answer, which examinees usually answer orally or in writing.3 There are several types of the test. One of them is achievement test which is designed to know how successful student has mastered the knowledge, abilities, and skills in the past learning activity.

According to Wilmar Tinambunan, there are four types of achievement test which are commonly used. First, a placement test is done at the beginning

of learning to know student’s early performance. Next, a formative test is

used to monitor student’s progress during the learning process. Third, a

diagnostic test is intended to detect student’s weaknesses during instruction.

Finally, a summative test is used to show the standard that students have reached in relation to other students at the same stage.4 In this research, the test that the writer would like to analyze is the summative test.

As one of methods to measure students’ achievement in learning process, a test should be well constructed. A well constructed test should have three main characteristics which involve validity, reliability, and practicallity. Valid in language testing means that how the test really evaluates what we actually want to measure. Whereas, reliability means that a test has to be consistent and reproducible. While, practicallity is concerned with a wide range of factors of economy, convenience and interpretability.5

Making a well constructed test is the teachers’ responsibility because they are the ones who know their students’ capability and the instructional objective itself. However, it is not an easy job. Some tests are built in a careless way. As stated by J. Stanley Ahmann and Marvin D. Glock,

“Classroom tests are tests constructed by classroom teacher for use in his

Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: Depdikbud, 1988), p. 3.

Wilmar Tinambunan, Evaluation of Students ...p. 7-9.

Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology and

(15)

particular classes. More of these tests are administrated than any other kind.

Unfortunately, they are carelessly constructed and interpreted.”6

Based on the explanation above, teachers need to evaluate the effecetiveness of the test items because it is necessary for teachers to know whether the test items work well or not. Meanwhile, Harold S. Maiden

explains that “the selections of appropriate language items are not enough by

itself to ensure a good test. Each question needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather simple statistical ways of checking individuals’ items. This procedure is called as item analysis.7 This is done by analyzing the students’ response to each item.

Items analysis of a test can be a valuable activity that can improve the

test’s reliability and validity. Items analysis procedures provide information

for evaluating the functional effectiveness of each item and for detecting weakness, which should be corrected. This information is useful when reviewing the test and it is indispensable when building a set of high quality items for the next test.

Items analysis has three main components; they are level difficulty, discriminating power, and effectiveness of the distracters. The difficulty level procedure provides data how many percentages of students who answer an item correctly. Discriminating power means whether the test can discriminate

the students’ ability or not. The last one means whether all the alternatives of

items function well or not.

The writer limits the problem of the study that he will discuss; he only focuses on the difficulty level of the test. The test should have the difficulty level whether it is included as easy, moderate, or difficult test. Besides, he needs to analyze how many percentages of items which are easy, moderate, and difficult. Moreover, it is able to distinguish between the students who have studied well and those who have not.

J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth Principles of Tests and

Measurement, (Boston: Allyn and Bacon, Inc, 1967), p. 17.

Harold S. Madsen, Technique in Testing, (New York: Oxford University Press, 1983), p. 180.

(16)

The writer intends to analyze the difficulty level of English summative test because he found some problems at the second grade of SMP Negeri 13 Tangerang Selatan. First, some students commented that the test is too difficult or too easy and so forth. Also, the main problem is that many students got low score. The writer tried to investigate about this problem. He wants to know how difficult the test is.

Based on the description given previously, the writer would like to perform items analysis toward the English Summative Test items for the second grade of SMP Negeri 13 Tangerang Selatan. The writer did the research under the title “AN ANALYSIS ON THE DIFFICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH SCHOOL AT ODD SEMESTER 2010/2011 (A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)”. B. Limitation of the Study

To make this study easier to understand, the writer limits the study as follow:

1. The research focused only on the difficulty level of English Summative Test at the odd semester 2010/2011

2. The test which is analyzed is English Summative Test for the second grade at odd semester, 2010/2011 academic year

3. The research focused only on the second grade students of SMP Negeri 13 Tangerang Selatan

C. Statement of the Problem

From the limitation of problem which has been explained above, the writer formulates the statement of the problem in this research as follow:

“Does the English Summative Test for the second grade of SMP Negeri 13 Tangerang Selatan at the odd semester 2010/2011 fulfill the criteria of a good test, in term of difficulty level?”

(17)

D. Objective of the Study

In line with the limitation of the problem, the objective of the study is to measure the quality of English Summative Test for second grade of SMP Negeri 13 Tangerang Selatan at the odd semester 2010/2011 and to know the difficulty level of each item.

E. Significance of the Study

The result of this study is expected to have some benefits in English teaching. It suggests to the test makers or classroom teachers when they find an item test which has a high or low difficulty. They could review which items that make the test too easy or too difficult and it can be followed up by rearranging the test. So, this study can give contributions or a useful input and feedback as bases for improving English Summative Test.

Besides the purpose above, the study will fulfill the writer’s final

assignment for his bachelor’s degree. Finally, other researchers who are

interested in analysis on the difficulty level can get basic information from this study to do the further research.

F. Method of the Study

The methods used in the research are descriptive analysis and quantitative. The writer took the English Summative Test paper and students’ answer sheet, then analyzed the difficulty level of each item. Quantitatively, the writer used some numerical data which is analyzed statistically. The writer also did library research by studying a number of references and literatures related to the topic of discussion to support the theoretical aspect of investigation.

(18)

CHAPTER II

THEORETICAL FRAMEWORK

In this chapter, the writer tries to give clear description of theoretical framework which covers definition and types of test, types of test item, characteristics of good test, definition and types of item analysis, and the importance of the item analysis.

A. Test

1. Definition of Test

In the process of evaluation, one of the method that can be used to gather data is a test. Many experts have stated some definitions of test. In his book, Educational Test and Measurement an Introduction, Anthony J.

Nitko writes “Test is a systematic procedure for observing and describing

one or more characteristics of a person with the aid of either a numerical

scale or category system.”1

Another opinion states that test is a technique or way consisting of some questions, statements, or tasks that are delivered to students in term of measuring their performance or behavior.2 Victor H. Noll also writes

Anthony J. Nitko, Educational Test and Measurement, an Introduction, (New York: Harcourt Brace Jovanovich, Inc., 1983), p. 6.

(19)

that a test usually includes the use of several certain instrument or set of instruments to determine a specific quality or trait.3

Moreover, Jum C. Nunnally states that, “A test is a standardized

situation that provides an individual with a score.”4

Based on some definitions above, it can be concluded that a test is a method or way to measure the behavior or performance of individuals and it consists of some systematic procedures for gathering data about their achievement. It is usually carried out under standardized situation in teaching and learning process.

2. Types of Test

There are many types of test used to measure students’ achievement.

However, there are four basic types of language tests: achievement tests, proficiency tests, progress tests, and aptitude tests.5

a. Achievement Test

In his book, Language Testing, Tim McNamara writes,

“Achievement tests accumulate evidence during, or at the end of, a course of study in order to see whether and where progress has been made in terms of the goals of learning. They relate to the past in that they measure what language the students have learned as a

result of teaching.”6

Furthermore, Nunnally states that, “The purpose of achievement

test is to measure progress in school up to a particular point in time. Achievement test is based on the core educational objectives shared by

the educators across the country.”7

Victor H. Noll, Educational Measurement, (Boston: Houghton Mifflin Company, 1965), 2nd

Ed., p. 13.

Jum C. Nunnally, Educational Measurement and Evaluation, (New York: McGraw-Hill, Inc., 1964), p. 6.

Rebecca M. Valette, Modern Language Testing, (New York: Harcourt Brace Jovanovich Inc., 1977), 2nd Ed., p. 5.

Tim McNamara, Language Testing, (New York: Oxford University Press, 2000), p. 6.

(20)

In addition, according to Rebecca M. Valette, “achievement tests are usually not built around one set of teaching materials but are designed for use with students from a variety of different schools and

programs.”8

In the writer’s opinion, achievement test is a test which is designed

to know how successful students have mastered the previous materials of a long period of course and whether they have achieved the educational objectives. So, by achievement test, it is able to compare among individual students, classes and school progress with others across the country.

According to Wilmar Tinambunan, there are four types of achievement test: placement, formative, diagnostic, and summative test.9

1. Placement Test

“Placement tests are designed to assess students’ level of

language ability so that they can be placed in the appropriate course or class. Such tests may be based on aspects of the syllabus taught at the institution concerned, or may be based on unrelated material. In some language centres students are placed according to their rank in the test results so that, for example, the students with the top eight scores might go into the top class. In other centres the students’ ability in different skills such as reading and writing may need to be identified. In such a centre a student could conceivably be placed in the top reading class, but in the bottom writing class, or some other combination. In yet other centres placement test may have the purpose of deciding whether students need any further tuition

at all.”10

Also, a quote by James Dean Brown in his book Testing in Language Programs states that the purpose of this test is to make a

Rebecca M. Valette, Modern Language ..., p. 5.

Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: Depdikbud, 1988), p. 7.

J, Charles Alderson, et. al., Language Test Construction and Evaluation, (Cambridge: Cambridge University Press, 1995), p. 11-12.

(21)

group of students who are in the same level of ability so teachers can focus and only concentrate on the problems or learning points suitable for that level.11

Moreover, placement tests provide information that helps to place students in the part of learning program most appropriate with their levels of ability. They are most successful in term of their use when they are constructed for particular situations.12 Most placement tests constructed by classroom teachers are pretests which function to know the readiness of students to begin the instruction and to place the students in the part of learning activity with the proper instruction.

2. Formative Test

Norman E. Gronlund writes that “formative tests are given periodically during instruction to monitor pupil learning progress

and to provide ongoing feedback to pupils and teachers.”13

It usually covers some parts of instruction, such as unit, chapter, etc.

In line with the opinion above, formative tests are carried out while the instruction is ongoing to identify learning progress students have made and to give the continuous feedback in term of strengths and weaknesses of learning activity.14 Furthermore, “the formative test is given during the course of instruction; its purpose to show which aspects of the chapter the student has mastered and

where remedial work is necessary.”15

James Dean Brown, Testing in Language Programs, (New Jersey: Prentice Hall Regents,

1996), p. 11.

Arthur Hughes, Testing for Language Teachers, (Cambridge: Cambridge University Press,

2003), 2nd Ed., p. 16-17.

Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan

Publishing Co., Inc., 1981), 4th Ed., p. 125.

Wilmar Tinambunan, Evaluation of Students..., p. 8.

(22)

Its result gives the information about how well students have mastered a particular material and provides them immediate feedback. With feedback, students can determine their learning errors or weaknesses then they can revise with or without teachers’ help.

Thus, in the writer’s opinion, formative test is designed to

check students progress during the instruction in mastering one particular learning point and to give students feedback directly. 3. Diagnostic Test

The result of diagnostic test is intended to show the specific weaknesses and strengths in a particular material or skill.16 It can be said that it is much comprehensive and detailed because it identifies the major causes of learning difficulties and then helps prepare a plan for remedial activity.

In his book, Testing for Language Teachers, Arthur Hughes

states that, “Diagnostic tests are used to identify learners’ strengths

and weaknesses. They are intended primarily to ascertain what

learning still needs to take place.”17_{In addition, “a diagnostic test is}

designed to determine the degree to which the specific instructional

objectives of the course have been accomplished.”18

Therefore, by using diagnostic tests, teacher knows what students have mastered and what areas in which a student needs further help. It is made while students are learning the language. So, diagnostic tests are typically delivered at the beginning or in the middle of a language course.

Robert Lado, Language Testing, The Construction and Use of Foreign Language Tests, (London: Longman Group Limited, 1961), p. 369.

Arthur Hughes, Testing for Language ..., p. 15.

(23)

4. Summative Test

According to Wilmar Tinambunan, “the summative test is

intended to show the standard which the students have now reached in relation to other students at the same stage. It typically comes at

the end of a course or unit of instruction.” 19

To support the opinion above, summative assessment methods are made to determine what a students has accomplished at the beginning or the end of a language course, then teachers can give a final mark to students.20 Moreover, Rebecca M. Valette states that,

“the summative test is usually given at the end of a marking period

and measures the “sum” total of the material covered.”21

In conclusion, the summative test is a test that is usually administered at the end of a language course, a semester or an academic year to know how successful students has achieved a wide range of material within a certain period. On this type of a test, students are usually ranked and graded.

b. Proficiency Test

James Dean Brown writes, “a proficiency test assesses the general

knowledge or skills commonly required or prerequisite to entry into a group of similar institutions. Such tests are very general in nature and cannot be related to the goals and objectives of any

particular language program.”22

Furthermore, Arthur Hughes states:

“Proficiency tests are designed to measure people’s ability in a

language regardless of any training they may have had in that language. The content of a proficiency tests, therefore, is not based on the content or objectives of language courses that people taking the test may have followed. Rather, it is based on a specification of

Wilmar Tinambunan, Evaluation of Students ..., p. 9.

Julie Cotton, The Complete Guide to Learning and Assessment, (New Delhi: Crest Publishing House, 2004), p. 24.

Rebecca M. Valette, Modern Language ..., p. 11.

(24)

what candidates have to be able to do in the language in order to be

considered proficient.”23

To sum up, proficiency tests measure someone’s general ability in

a language and they are not related to some previous courses of instruction. The proficiency tests usually consist of standardized multiple-choice items on grammar, vocabulary, reading comprehension, aural comprehension, and sometimes on writing. c. Progress Test

Based on the book Language Test Construction and Evaluation,

“progress tests are given at various stages throughout a language

course to see what the students have learnt.”24

Meanwhile, another opinion states that, “the progress test measures

how much the student has learned in a specific course of instruction. The tests that the classroom teacher prepares for administration at the end of a unit or end of a semester are progress tests.”25

Thus, progress test is used to check students progress in learning one particular lesson and teacher can administer it at anytime of language course.

d. Aptitude Test

According to Robert Lado, “aptitude tests are designed to predict

the degree of success that individual students will have in studying a

foreign language.”26

In addition, an aptitude test is typically used to make a prediction about how successful students are in the learning activity they will have.27

The opinions above are supported by Howard B. Lyman in his book, Test Scores and What They Mean, he writes that, “All aptitude

Arthur Hughes, Testing for Language ..., p. 9.

J, Charles Alderson, et. al., Language Test ..., p. 12.

Rebecca M. Valette, Modern Language ..., p. 5.

Robert Lado, Language Testing, ..., p. 370.

(25)

tests imply prediction. They give us a basis for predicting future level

of performance.”28

Because it functions to measure the potential capacity of an individual, aptitude test can be used to decide how long students will master a foreign language sufficiently. Also, it is often used in selecting individuals for language training, for jobs, for scholarships, and for many other purposes.

B. Categories of Good Test

Test as an instrument of obtaining information should have a good quality. The quality of a test will influence the result of the test itself. Once the test has a good quality, the right information will be gained and used to make accurate decision to the students achievement.

According to David P. Harris, “all good tests possess three qualities:

validity, reliability, and practicality.”29

1. Validity

In the book, Educating Pupil Growth Principles of Tests and Measurement, “validity is often defined as the degree to which a measuring instrument actually serves the purposes for which it is

intended.”30 _{Also, Norman E. Gronlund writes that, “validity refers to the}

extent to which the results of an evaluating procedure serve the particular

uses for which they are intended.”31

So, validity of a test means that the test really measures what it is supposed to measure. According to some experts, three types of validity have been identified and are commonly used in educational measurement.

Howard, B. Lyman, Test Scores and What They Mean, (Boston: Allyn and Bacon, 1998),

6th Ed., p. 22.

David P. Harris, Testing English as a Second Language, (New York: McGraw-Hill Inc., 1969). p. 13.

J. Stanley Ahmann and Marvin D. Glock, EducatingPupil Growth Principles of Tests and

Measurement, (Boston: Allyn and Bacon, 1967), 3rd Ed., p. 285.

(26)

a. Content Validity

A test can be said to have content validity if it is built with a representative sample of the language skills, structures, etc. which it is meant to be concerned.32 In line with that, Anthony J. Nitko writes

that, “content validity is the extent the items on a test are representative

of the domain or universe that they are supposed to represent.”33

Thus, the degree of content validity in a test relates to how well the the test measures the content of subject matter that students studied before. Therefore, it is important to make sure that the test covers all the areas of material that are supposed to be assessed. For example, a grammar test should be made up of items relating to the knowledge of grammar.

b. Construct Validity

This type of validity relates to any underlying ability that is

formulated in a theory of language ability. Construct validity is “the

extent that a test measures the trait, attribute, or mental process it should measure, and whether descriptions of persons in terms of such

constructs can follow using the scores from that test.”34

Moreover, Arthur Hughes writes that, “it is a matter of empirical research to establish whether or not such a distinct ability exsists, can

be measured, and is indeed measured in that test.”35

In other words, it can be said that a test has construct validity if it is able to measure certain specific characteristics agreeable with a theory of language and behavior in learning.

c. Criterion-Related Validity

Criterion-related validity relates to the extent how agreeable the results of the test with the results come from the another independent

Arthur Hughes, Testing for Language ..., p. 26.

Anthony J. Nitko, Educational Test ..., p. 413

Anthony J. Nitko, Educational Test ..., p. 413.

(27)

and trustworthy assessment of student’s competence.36

In addition, in his book, Educational Tests and Measurement, An Introduction,

Anthony J. Nitko states that, “criterion-related validity questions

concern the extent to which scores on a test permit inferences about

examinees’ likely standing on another measure called a criterion.”37

This type of validity can be divided into two parts; namely, concurrent validity and predictive validity.

1. Concurrent Validity

According to J. Stanley Ahmann and Marvin D. Glock, this

validity is “designed to estimate present status with respect to a

characteristic different from the test.”38 In other words, it tries to

determine a student’s present standing indirectly.

Concurrent validation is carried out by comparing an individual’s test scores with his other assessment taken at about the same time.

2. Predictive Validity

Predictive validity is intended to predict how well someone

will perform in the future. It is supported by a quote, “predictive

validity concerns the degree to which a test can predict candidates’

future performance.”39

To do this validition, the earlier test scores from individual students are correlated with grades made at the end of the first semester.

2. Reliability

Consistent measurement is a necessary condition for high quality educational testing. This consistency of a test is called as reliability.

Arthur Hughes, Testing for Language ..., p. 27.

Anthony J. Nitko, Educational Test ..., p. 422.

J. Stanley Ahmann and Marvin D. Glock, EducatingPupil ..., p. 288.

(28)

“Reliability refers to the consistency of measurement – that is, to how consistent test scores or other evaluation results are from one measurement

to another.”40

According to Desmond Allison, “the reliability of a test concerns the

accuracy and trustworthiness of its results. Reliable test results will

accurately reflect each student’s understanding of whatever is being

tested.”41

To sum up, a test is reliable if it consistently produces the same, or nearly the same result or rank for the same individual taking the test several times on the different occassion.

3. Practicality

The last quality that a good test should have is practicality or usability. In selecting a test and other instruments, practical considerations cannot be neglected. These are some factors relevant to the practicality when selecting tests:42

a. Ease of Administration

“The administrability of evaluation devices refers to the ease and accuracy with which the directions to pupils and evaluator can be

followed.”43

In addition, ease of administration involves the simple and clear directions, the subtests in minimum numbers and the easy timing. b. Time Required for Administration

The test’s length is directly related to the reliability of a test, so the availability of enough time should be taken. “A safe procedure is to

Norman E. Gronlund, Measurement and Evaluation ..., p. 93.

Desmond Allison, Language Testing ..., p. 85.

Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation in Teaching, (New

York: Macmillan Publishing Company, 1990), 6th Ed., p. 102-103.

H. H. Remmers, et. al., A Practical Introduction to Measurement and Evaluation, (New York: Harper & Brothers, 1960), p. 126.

(29)

allot as much time as is necessary to obtain valid and reliable

results.”44

c. Ease of Interpretation and Application

If the test is interpreted correctly and applied effectively, teacher can make accurate educational decisions about students performance. d. Availability of Equivalent or Comparable Forms

Equivalent test measure the same aspect and is alike in content, level of difficulty, and other characteristics. It is useful if teacher wants to remove the factor of memory when retesting students on the same domain. Comporable forms are especially useful in measuring the progress of the basic skills.

e. Cost of Testing

The factor of the cost is actually not really important in selecting test. Testing is relatively inexpensive. However, the point is the test should be as economical as possible in cost.

C. Types of Test Item

An item is the basic unit of language testing. According to James Dean

Brown, the definition of the item “is the smallest unit that produces distinctive

and meaningful information on a test or rating scale.”45

The items used in clasroom tests are commonly divided into two broad categories: (1) the objective item, and (2) the essay test.

1. Objective Test

In constructing an achievement test, the test maker may choose from a variety of item types. One of them is referred to as objective item. This

kind of item types can be scored objectively. Furthermore, “equally

competent scorers can score them independently and obtain the same

Wilmar Tinambunan, Evaluation of Students ..., p. 23.

(30)

results.”46 _{In addition, Rebecca M. Valette defines objective test as “any}

item for which there is a single predictable correct answer.”47

Thus, when scoring this test, any subjective judgement from the scorer is pushed aside because every item in that test has only one absolutely right answer. So, although the test is scored in several different times by one scorer or another, it will obtain the same result.

The objective item can be classified into two types, which are selection-type test item and supply-type test item.

a. Selection-Type Test Item 1. Multiple Choice

According to Anthony J. Nitko, “a multiple choice item

consists of one or more introductory sentences followed by a list of two or more suggested responses from which the examinee

chooses one as the correct answer.”48

The other responses which

are as incorrect answers function to distract students’ attention

away from the correct answer in case they are uncertain of the answer.

In line with that quote, “multiple choice items are made up of

an item stem, or the main part of the item at the top, a correct answer, which is obviously the choice that will be counted correct, and the distractors, which are those choices that will be counted as

incorrect.”49

For example:

Budi has been here ____________ half an hour.

a. during c. while

b. for d. since

Norman E. Gronlund, Constructing Achievement Test, (New Jersey: Prentice-Hall, Inc., 1982), 3rd Ed., p. 36.

Rebecca M. Valette, Modern Language ..., p. 8

Anthony J. Nitko, Educational Test ..., p. 190.

(31)

The multiple choice item is commonly recognized as the most applicable and useful type of objective test item. It can be used to measure both knowledge outcomes and many types of skills. In addition, it can measure a variety of learning outcomes from simple to complex material.

The multiplce choice item is included in discrete point test. Discrete point test takes language skill apart. Oller states that,

“discrete items attempt to test knowledge of language one bit at a

time.”50

It means that language knowledge can be divided into a number or components, such as grammar, vocabulary spelling, punctuation, pronunciation, intonation, and stress. This test only measures the knowledge of language in one particular component.

Actually, it is not too difficult for test maker or teacher to construct multiple choice item test. However, there some suggestions that they shoul consider in constructing this type of test items:51

a. The stem of the item should be meaningful by itself and should show a specific problem.

b. The item stem should include as much of the item as possible and should be free of irrelevant material.

c. A negatively stated item stem can be used only when significant outcomes need it.

d. All of the alternatives should be grammatically consistent with the stem.

e. An item should contain only one clearly correct answer. f. Items used to measure understanding should contain some

novelty, but beware too much. g. All distracters should be plausible.

h. Verbal associations between the stem and the correct answer should be avoided.

i. The relative length of the alternatives should not provide a clue to the answer.

j. The correct answer should appear in each of the alternative positions and in equal number but in random order.

John W. Oller, Language Tests ..., p. 37.

(32)

k. The special alternatives such as “none of the above” or “all

of the above” can be used sparingly.

l. Do not use multiple choice item when other item types are more appropriate.

Although it can be said as the most applicable and useful type of test item, multiple choice item has some limitations, such as:52

a. The technique tests only recognition knowledge. A multiple

choice item gives a quite inaccurate result of students’

ability in productive and receptive skills.

b. Guessing may have a considerable but unknownable effect

on test scores. We never know what part of any individual’s

score comes through guessing. So, we cannot identify the

truly students’ competence or ability.

c. The technique severely restricts what can be tested. The basic problem here is that it requires distractors, and they are not always available.

d. It is very difficult to write successful items. The common faults fall on some areas such as more than one correct answer, no correct answer, the obvious clues in the options, ineffective distractors.

e. Backwash may be harmful. Practice at multiple choice items will not usually be the best way for students to improve their command of a language.

f. Cheating may be facilitated. The fact that how to response on a multiple choice item is so simple makes students easy to communicate each other non-verbally.

Beside its limitations, multiple choice item also has some advantages. Wilmar Tinambunan writes the advantages of multiple choice item as follow:53

a. The multiple choice item can be used for subject matter content in any different levels of behaviour, such as ability to reason, discriminate, interpret, analyze, infer, and solve problems. b. It has less chance for students to guess the right answer than the

true-false item does because it is followed by four or five alternatives.

Arthur Hughes, Testing for Language ..., p. 76-78.

(33)

c. One advantage of the multiple choice item over the true-false item is that students also know what is correct rather than only know that a statement is incorrect.

In the writer’s opinion, multiple choice item includes at least

three components, which are the stem, the distractors, and the correct answer. The stem can be the direct question or incomplete statement which students have to response. The distractors are presented to distract the students who do not study well for choosing the answer correctly. This type especially useful for measuring learning outcomes that require the understanding, application, or interpretation of factual information.

2. True-False

In the book, Criterion-Referenced Language Testing, true-false

item “requires student to respond to the language by selecting one

of two choices, for instance, between and true and false or between

correct and incorrect.”54

In line with that opinion, Norman gives the definition of true-false item as follow:

“True-false item is simply a declarative statement that the student must judge as true or false. There are modifications of

this basic form in which the student must respond “yes” or “no,” “agree” or “disagree,” “right” or “wrong,” “fact” or

“opinion,” and the like. Such variations are usually given the

more general name of alternative-response items. In any event this item type is characterized by the fact that only two

responses are possible.”55

For example:

Direction: Read each of the following statements, if the statement is true grammatically, circle the T. If the statement is false gramatically, circle the F!

James Dean Brown and Thom Hudson, Criterion-Referenced Language Testing,

(Cambridge: Cambridge University Press, 2002), p. 66.

(34)

T F 1. Toni usually help her mother in cooking. T F 2. Every student must bring their own book. T F 3. If I had much money, I would buy a house. T F 4. She is smarter in our classroom.

T F 5. The men are gathered in a conference room.

The most common use of the true-false item is to measure the ability to identify the correctness of statement of fact, definition of terms, principles, etc and to distinguish fact from opinion.56 It is used in measuring such relatively simple learning outcomes so a single declarative statement is provided with one of several methods of responding.

Therefore, to make it more effective in measuring students’

understanding, there are some rules which should be noticed for constructing true-false items:57

a. Include only one central, significant idea in each statement b. Word the statement so precisely that it can be judged true

or false unequivocally

c. Keep the statement short, and use simple language structure d. Use negative statements sparingly, and avoid double

negatives

e. Statements of opinion should be attributed to some source f. Avoid extraneous clues to the answer

Moreover, Anthony J. Nitko states that this item type has some advantages and criticisms.58 Here they are:

Advantages:

a. Certain aspects of the subject matter lend themselves to verbal prepositions that can be judged true or false

b. Such items are relatively easy to write c. They can be scored easily and objectively

d. They can cover a wide range of content with a relatively short period of testing

Wilmar Tinambunan, Evaluation of Students ..., p. 70.

Norman E. Gronlund, Constructing Achievement ..., p. 55-56

(35)

Criticisms:

a. They are often used only to test specific, frequently trivial, facts

b. They can be ambigiously worded

c. They can be answered correctly by blind guessing

d. They may encourage students to study and accept only oversimplified statements of truth and factual details

Thus, true-false item is the item type which contains a single written statement and then it must be decided by students whether it is true or false. It is constructed to check and measure whether a simple particular point has been comprehended or not.

3. Matching

“The matching item consists of two paralell coloumns with

each word, number, or symbol in one coloumn being matched to a word, sentence, or phrase in the other coloumn. The items in the coloumn for which a match is sought are called premises and the items in the coloumn from which the selection is made are called responses. They are useful in measuring students ability to make associations, discern relationship, make

interpretations or measure knowledge of a series of facts.”59

In other words, this item type presents students with two coloumn of information in which they have to match the correct option or response to premise. It is typically used to measure factual information or knowledge based on simple relationship. Therefore, when learning outcomes concern on the ability to identify the relationship between two things, matching item should be the most appropriate. For example:

Match the following words on the left with their synonyms on the right!

1. ( ) Receive a. Carry

2. ( ) Achieve b. Emerge

3. ( ) Bring c. Increase

(36)

4. ( ) Appear d. Accept

5. ( ) Improve e. Accomplish

Furthermore, James Dean Brown formulates three guidelines that teachers should apply in constructing matching items:60

a. More responses should be supplied than premises so that students cannot narrow down the choices as they go along by simply keeping track of the options that they have already used.

b. The responses should usually be shorter than the premises because most students will read a premise and then search through the options for then correct match.

c. The premises and responses should be logically related to one central theme that is obvious to the students.

Moreover, matching item has some advantages to be carried

out in testing. The first advantage is “its compat form, which

makes it possible to measure a large amount of related factual

material in a relatively short time.”61 _{Secondly, “the effects of}

guessing is reduced since the student will have one chance out of a

number of responses available of guessing correctly.”62

At last, it has ease of construction.

4. Rearrangement

“Rearrangement items require the pupil to put into some

specified order a series of randomly presented material.”63

In the book, Measurement and Evaluation in the Schools, Louis J. Karmel states that any kind of specified order may be called for, such as chronology, order of difficulty, order of importance, length, weight, logic, and so on.64

James Dean Brown, Testing in Language ..., p. 57.

Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation ..., p. 159.

Wilmar Tinambunan, Evaluation of Students ..., p. 65.

H. H. Remmers, et. al., A Practical Introduction ..., p. 243.

Louis J. Karmel, Measurement and Evaluation in the Schools, (London: The Macmillan Company, 1970), p. 382.

(37)

For example:

Rearrange these following sentences into a good paragraph! 1. Suddenly, it was getting dark and he realized that he got lost 2. Once upon a time, there was a bee named Bumbee

3. Bumbee could get home and gathered with his family happily 4. One day, he felt so happy and flew alone in the forest

5. Fortunately, a butterfly appeared and she liked to help him b. Supply-Type Test Item

1. Short-Answer

According to Norman E. Gronlund in his book, Constructing Achievement Test, he states that, “the short answer (or completion )

item is the only objective item type that requires the examinee to

supply, rather than select, the answer.”65

In line with that opinion,

this item type “generally requires the students to examine a

statement or question then respond to it with a phrase or two, or a

sentence or two, in the space provided.”66

Both short answer item and completion item can be answered by a word, phrase, sentence, number, or symbol. In the short answer item, the question is presented as a direct question:

For example:

a. What is the capital city of West Java? (Bandung) b. Who invented the lightbulb? (Thomas Alfa Edison)

Whereas, the completion item requires student to supply the answer in an uncomplete statement.

For example:

a. The capital city of West Java is ... (Bandung)

b. The name of the man who invented the lightbulb is ... (Thomas Alfa Edison)

Norman E. Gronlund, Constructing Achievement ..., p. 57.

(38)

It seems obvious that short answer item or completion item observe performance from the lowest level of the cognitive domain. Moreover, it is suitable for measuring a wide variety of relatively simple learning outcomes.

To make this item type effective to measure simple learning outcomes, there are some suggestions that should be noticed in order not to make the items in a careless way:67

a. Require short, definite, clean-cut answers

b. If several correct answers (synonyms) are possible, count each one correct or change the item to restrict the correct answer

c. Decide whether spelling should be disregarded or given a separate score

d. Minimize the use of textbook expressions or stereotyped language in phrasing the questions

e. Specify the terms in which the response is to be given f. In testing for a knowledge and understanding of definitions,

it is often better to provide the term and require a definition than to provide a definition and require the term

g. Direct questions are probably preferable to incomplete declarative sentences

h. Hints concerning the correct answer, in the form of the first letter of a word, or a number indicating the number of letters in a word, should generally not be given

i. The space for the response should usually be at the right of the question

j. Allow enough space for the responses to permit legible writing

k. Arranging the answer spaces in a coloumn at the right-hand margin of the page makes scoring more convenient

Furthermore, short answer item has some advantages and disadvantages like Arthur Hughes writes in his book, Testing for Language Teachers:68

a. Advantages:

1. Guessing will (or should) contribute less to test scores 2. The technique is not restricted by the need for

distractors (compared with multiple choice)

H. H. Remmers, et. al., A Practical Introduction ..., p. 223-225.

(39)

3. Cheating is likely to be more difficult

4. Though great care must be taken, items should be easier to write

b. Disadvantages:

1. Responses may take longer and so reduce the possible number of items

2. The test taker has to produce language in order to respond

3. Scoring may be invalid or unreliable, if judgement is required

4. Scoring may take longer 2. Fill-in

This item type provides a sentence or a passage and some contents or parts are removed. Then, students are asked to fill those blank spaces. As James Dean Brown and Thom Hudson write,

“this format provides a language context of some sort and then removes part of the context and replaces it with a blank. The

student’s job is to fill in that blank.”69

For example:

1. He failed another exam, __________ he had studied very hard. 2. She does not come today. She __________ be sick.

3. Once upon a __________, there was a farmer living in a small village in England. His __________ was Jack. He was a kind and wise man. He liked to help his neighbors. Jack __________ a mill machine. People came to his place to __________ their grain. Jack served them happily. However, his wife was a very __________ woman. She often complained. She __________ angry every time Jack __________ some food to the neighbors.70

James Dean Brown and Thom Hudson, Criterion-Referenced ..., p. 73.

Ali Akhmadi and Ida Safrida, Smart Steps Grade VII, (Bandung: Ganeca Exact, 2005), p. 133.

(40)

In addition, fill-in item measures the student’s ability to produce a language, even if a small amount of language. However, to make the measurement by fill-in item result the valid data, it is prominent to tell clearly to students that only one word can be put in each blank or gap.

For more advanced, in order to use fill-in item in an efficient

way for measuring students’ performance, there are five

considerations issued by James Dean Brown that teachers should remember:71

a. Teachers should check to make sure that each item has one very concise correct answer

b. Teacher should make sure that enough context has been provided that the purpose, or intent, of the item is clear to those students who know the answer

c. All the blanks in a fill-in test should be the same length d. Teachers should also consider putting the main body of the

item before the blank in most of the items so that the students have the information necessary to answer the item once the encounter the blank

e. In situations, where the blanks may be very difficult and frustrating for the students, teachers might consider supplying a list of responses from which the students can choose in filling in the blanks

Furthermore, as one of types of test item, fill-in item has some advantages and limitations:72

Advantages:

a. It is relatively easy to construct

b. It is flexible to use from a test writer’s point of view

c. It requires a short amount of time to administer Limitations:

a. It is generally very narrowly focused on testing a single word or short phrase at most

b. It may have a number of possible answers

James Dean Brown, Testing in Language ..., p. 58-59.

(41)

2. Essay Test

According to J. Stanley Ahmann and Marvin D. Glock on the book,

Educating Pupil Growth Principles of Test and Measurements, “an

essay test item demands a response composed by the pupil, usually in one or more sentences, of a nature that no single response or pattern of responses can be listed as correct, and the accuracy and quality of which can be judged subjectively only by one skilled and informed in

the subject, customarily the classroom teacher.”73

In addition, the major characteristic of essay test is the freedom of response it provides. It means that students have to produce their own answer.74 To support the opinion above, Wilmar Tinambunan states that,

“the essay-type question requires the examinee to read the question,

formulate his response and express the response in his own words.”75

Essay question can be classified into two types, which are: a. Restricted Response Type

The student is not given a complete freedom to make his response.

“it usually limits both the content and the response. The content is

usually restricted by the scope of topic to be discussed. Limitations of response are commonly indicated in the question.”76

For example:

1. State the main differences between the objective test and the subjective test according to Norman E. Gronlund!

2. Explain two advantages and two disadvantages of using the multiple choice item in testing English as a foreign language! b. Extended Response Type

In this type, student is given the freedom completely in composing

his response. “it allows pupil to select any factual information that they

think is pertinent, to organize the answer accordance with their best

J. Stanley Ahmann and Marvin D. Glock, EducatingPupil ..., p. 157.

Norman E. Gronlund, Constructing Achievement ..., p. 71.

Wilmar Tinambunan, Evaluation of Students ..., p. 56.

(42)

judgment, and to integrate and evaluate ideas as they deem

appropriate.”77

For example:

1. Why is English so important nowadays?

2. Describe the roles of the teacher in language testing!

Moreover, building the essay item as a measurement of complex learning outcomes should be done in a proper and careful way. Here are some suggestions to construct a good essay item:78

1. Make definite provisions for preparing students for taking essay examinations

2. Make sure that questions are carefully focused 3. Structure the content and length of questions

4. Have a colleague review and critique the essay questions

5. Avoid the use of optional questions, except when one is assessing writing ability where a choice of questions is desirable

6. Restrict the use of the essay as an achievement test to those objectives for which it is best

As a method to measure the complex learning outcomes, essay item has several advantages and weaknesses.

Advantages:79

1. It measures complex learning outcomes that cannot be measured by other means

2. It emphasize on the integration and application of thinking and problem-solving skills

3. It is regarded as a device for improving writing skills

4. It has ease of construction. Most teachers can formulate several essay questions in a matter of minutes

Weaknesses:80

1. There are not many samplings of achievement because only a small number of questions can be included in essay test

Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation ..., p. 213.

Kenneth D. Hopkins, et. al., Educational and Psychlogical Measurement and Evaluation,

(Englewood Cliffs, New Jersey: Prentice Hall Inc., 1990), 7th Ed., p. 216.

Norman E. Gronlund and Robert L. Linn, Measurement and Evaluation ..., p. 216.

(43)

2. Scoring the essay test is influenced by student’s writing ability. Poor expression and errors in punctuation, spelling, grammar usually lower their score

3. While scoring essay test, the standards can be shifted because of variations in the content of the answers from paper to paper

4. It requires much time to score the answers81

Thus, in essay item, students are asked to demonstrate their ability to select, organize, integrate and review ideas to response the question in the freedom. In addition, this item type is scored subjectively since it will presents the different results when it is scored by the different person. The people who are assigned to score the answers are typically influenced by their own judgment or opinion.

To sum up, based on the previous explanation, an essay test is used to

measure student’s comprehension of a certain knowledge and student is

asked to answer by expressing his own words effectively and organizing their own ideas, using information from his own background and knowledge.

D. Item Analysis

1. Definition of Item Analysis

Obtaining the valid data as information is very valuable to give the

clear judgment about student’s performance in evaluation activity. In case

of that, the test should have a good quality and every item functions properly. Teacher or test maker should know whether the test can be included as a good test or not by evaluating every item in that test. This activity is called as item analysis.

According to Anthony J. Nitko, “item analysis refers to the process of

collecting, summarizing, and using information about individual test

items, especially information about pupil’s response to item.”82

(44)

In addition, “item analysis as a whole will be defined here as the

systematic statistical evaluation of the effectiveness of individual test items. Item analysis is usually done for purposes of selecting which items will remain on future revised and improved versions of the test. Sometimes, however, item analysis is performed simply to investigate how well the items on a test are working with a particular group of students, or to study which items match the language domain of

interest.”83

Moreover, Arthur Hughes proposes the purpose of item analysis

which is “to examine the contribution that each item is making to the test.

Items that are identified as faulty or inefficient can be modified or

rejected.”84

Although item analysis is done primarily for response-choice item, it is available for teacher to use several of the techniques described with any items that are scored dichotomously (simply as correct or incorrect).85

In the writer’s opinion, item analysis is statistical evaluation to know

the quality of a test by identifying whether every item on a test works appropriately or not. It is done by collecting students’ responses to each item so that it can also be known which items are included as a good one and which items that weaken the test. It is very useful for teacher to performs item analysis since it can be a device for test improvement. 2. Kinds of Item Analysis

Item analysis usually concentrates three vital features: level of difficulty, discriminating power, and the effectiveness of each alternative.

“Thus, item analysis can tell us if an item was too difficult or too easy,

how well it discriminated between high and law scores on the test, and

whether all the alternatives functioned as intended.”86

Anthony J. Nitko, Educational Test ..., p. 284.

James Dean Brown and Thom Hudson, Criterion-Referenced ..., p. 113.

Arthur Hughes, Testing for Language ..., p. 225.

Anthony J. Nitko, Educational Test ..., p. 286.

(45)

a. Level of Difficulty

The first area in item analysis is level of difficulty which concerns on how easy or difficult each item is. According to Kathleen M.

Bailey, difficulty level is “an index of how easy an individual item was

for the people who took it. It is typically printed as a decimal, ranging from 0.0 to 1.0. It represents the proportion of people who got the item

right.”87

Furthermore, in the book, Language Tests at School, “difficulty level (or item facility) has to do with how easy (or difficult) an item is from the viewpoint of the group of students or examiness taking the

test of which that item is a part.”88

In writer’s opinion, level of difficulty deals with how many

percentage of students who response an item correctly and those who response incorrectly. By analyzing the difficulty level of each item, it can be inferred whether an item is included as easy, moderate or difficult item.

Level of difficulty is interpreted in the form of percentage. The larger the percentage of the correct answer, the easier the item is. Then, the fewer the students who answer correctly, the more difficult the item is.

Henceforth, a good test item should have the level of difficulty, which includes easy, moderate, and difficult level. The effective and good test should have the items that belong to moderate level. The item that is too easy or difficult potentially weaken the quality of the

test and the valid data of information about student’s achievement will

not be acquired.

In addition, level of difficulty analysis can be applied for either large group of students or the small one.

Kathleen M. Bailey, Learning about Language Assessment: Dilemmas, Decisions, and Directions, (New York: Heinle & Heinle Publishers, 1998), p. 132.

(46)

As a quote, from Lyle F. Bachman, states that, “to conduct an item analysis, we first arrange the scored test papers or answer sheets in order from the highest score to the lowest score. Next, we separate the papers into upper and lower groups, according to their total test scores. For large groups, we would choose the upper and lower 27 percent, while for smaller groups, we would typically choose the upper and lower one-third.”89

The formula used for analyzing the difficulty level of each item in large group is stated below:

In which:

TK : Index of difficulty

U : The number of students in the upper group who answer the item correctly

L : The number of students in the lower group who answer the item correctly

T : The number of students in upper and lower group90

Next, for the small group, teacher or test maker can easily evaluate

an item by using all the students’ answer sheets. Then, the formula is:

Lyle F. Bachman, Statistical Analyses for Language Assessment, (Cambridge: Cambridge

University Press, 2004), p. 123.

M. Ngalim Purwanto, Prinsip-Prinsip dan Teknik Evaluasi Pengajaran, (Bandung:

Remadja Karya, 1986), p. 153.

P = JS

U + L TK = T

(47)

In which:

P : Index of difficulty

B : The total number of students who got the item correct JS : The number of students who took a test91

The formula above is commonly used for multiple choice item. For the short-answer item, Zainal Arifin states as follows:92

After analyzing each item and obtaining its difficulty level, the next thing to do is finding out the difficulty level for whole items in a test. It is performed by using the following formula:

In which:

P : Difficulty level for whole items b : Difficulty level of each item

Σ : Sigma (Total)

N : Total number of test items93

Suharsimi Arikunto, Dasar-Dasar Evaluasi Pendidikan, (Jakarta: PT. Bumi Aksara, 2006),

p. 208.

Zainal Arifin, Evaluasi Pembelajaran ..., P. 135.

The total of student’s score for each item Mean =

The number of students

Mean Index of difficulty =

Maximum score of each item

Σ b P = N

(1)

Text for No. 34-36

Last week. nly parents, _{sister, brother and I went to the zoo. We went there for} recreation. _{We left at 06.00 a.m and arrived there at 08.00 a.m. the zoo is about a hundred} kilometers from my house.

There were a lot of people watching a giant snake. The snake was there for about a week. It was 9 rnetres _{long. I thought it was the biggesi snake I had ever seen. After} going around and rvatching ..,arious _animals,_{We went horne.}

34. The writer went to the zoo with her . . ..

A. parents _{B. sister} _{c. brother} _{D. fanrilv}

35. Whai is speciai in ihe zco'l

A. wild anirrrais. _{C. various animals} B. a giant snake. D. animal,s attraction

36. "After going around rvatching various animal, we u,ent home.,, (paragraph 2) The underlined word means ....

A. different B. simirar c. wild _{D. Mean}

Teks for No.37-38

2nd December 2010 To : Sherly

Jalan Melati number 3 Deai- Sherly,

Thank you very much for inviting me to your 14'h birthday party on i2 December 2010. I was really looking forward to it, but sadly I will not be able to come. I am scheduled _{to see the dentist. I hope everybody has a great time.}

Your sincerelv-Lina

37. Why can't Lina come to Sherly,s birlhday?

A. She must see the dentist _{C. She is disappointed}

B. She will pick up the dentist _{D. She has trei t+,h birthday party}

38. "I am scheduled _{to see the dentist." The underlined word is similar to . ...}

A. Forced _{B. Asked} _{c. ordered} _{D. planned}

(2)

T e x t f b r N o . 3 9 - 4 0

M y c a t

Spot is a regular housc cat. Ile is an adorable cat. He has orrirlse fur with white a't1 black spots. I like to cucidle him becarlse his fur feels soft. Every nrorning I give spot milk. Spot clces trot like ricc. so I give hinl cat food'

Spot is an active ar-rirnal. He likcs to run around the hor-rse. He likes to chase everyone ln ny house. Whel he fecls tireci or sleepy, Spot usually sleeps on thc sofa in the living roottt or sometimes under the table-'

3 9 .

Which staternent rs not true according to A. Spot dislikes rice C. B. Spot has orange spots D

40. "He likes to chase everyone ln Iny

housc."'fhe synoliyrn cf the underlined word

A. Trick B. Catch

II. !]SSAY

41 . Arrange the jumble words into a lrr - Asia is largest lake

-the text'?

Spot likes to run around the house

Spot always sleeps otl the sofa in the living room

42. Arrange the following sentence to make a good paragraph' o My lamill and i rvettt to Park

o Yesterday was a holidaY

r ltt the afternoon, we wettt home o We played around and had lunch there

o It rvas full of fun

1 s . . . . C. Hunt

good sentence!

southeast - lake - Toba - the

D. Cheat

its body is as big as a 43 This anirnal eats grass or leaves. It

corv. It is smaller than an elephant-T h i s a n i m a l i s . . . .

has trryo strong horns on his head. It usually lives in a group.

44. Change the verbs in the brackets into right fot'rl!

On Sunday i (go) ...to the village to see my grandma. I (leave):" at eight in the momlng' (waif)... for the bus at the bus stop. Five minutes later a bus (come)....

45. This r1ammal is a sea animal that looks like a large fish with a pointed mouth. It is very intelligent and friendly towards human'

T h is a n i r n a l i s . . ..

(3)

No

: Istimewa

Hal

: Pengajuan

Judul Skripsi

Lampiran : I (satu) berkas

Kepada Yth,

Ketua Jurusan Pendidikan Bahasa Inggris

FITK UIN Jakarta

Di

Tempat

Assalamu'alaikum

Wr. Wb.

Saya yang bertanda

tangan di bawah ini:

Tangerang,

30 Maret 2011

(Andrian Dwi Prayoga)

NIM: 107014000882

Nama

NIM

Jurusan

Fakultas

Andrian Ilwi Prayoga

107014000882

Pendidikan

Bahasa

Inggris

Ilmu Tarbiyah dan Keguruan

Bermaksud untuk mengajukan judul skripsi sebagai salah satu syarat untuk menyelesaikan

progr€rm

S-1 (strata l) UIN Syarif Hidayatullah

Jakarta.

Adapun

judul yang diajukan adalah:

..AN ANALYSIS ON TIIE DIFFICULTY LEYEL OF ANGLISH STJMMATIVE TEST

FOR SECOND GRADE OF JT]FilOR HIGH SCHOOL AT ODD SEMESTER 2010/2011"

(A Case Study at the Second Grade of SMP Negeri 13 Tangerang Selatan)

Bersama ini saya lampirkan satu berkas proposal yang terdiri dari:

1. Abstract

2. Outline

3. Temporary

references

Demikian surat pengajuan

ini disampaikan.

Atas pertimbanganrlya,

saya sampaikan

terima kasih.

Wassalamu'alaikum

Wr. Wb.

Pengaju,

({-r'._-

_4tli

(4)

KEMENTERIAN

_AGAMA

UIN JAKARTA

FITK

Jl. lr. H. Juanda No 95 Ciputat 15412 tndonesia

FORM (FR)

No.Dokumen :--F|TKfRAKDrOS1 Tgl. Terbit : t Hrtaret ZO1O

SURAT BIMBINGAN

SI.(RIPSI

Nomor

: Un.0

t/F.

t/r(M.}r.3/q.!F.t20r

_l

L a m p . :

-Hal _{: Bimbingan Skripsi} Kepada yth.

Dr. H. Muhammad Farkhan. M.pd Pembimbing Skripsi

Fakultas Ilmu Tarbiyah dan Keguruan UIN Syarif Hidayatullah

Jakarta.

Iakartq 04 April2011

Nama NIM Jurusan Semester Judul Skripsi

AN ANALYSIS ON THE DIF'FICULTY LEVEL OF ENGLISH SUMMATIVE TEST FOR SECOND GRADE OF JUNIOR HIGH SCHOOL

AT ODD SEMESTER 2OIO/2071

(A case Stuciy at Second Grade of sMp Negeri 13 Tangerang Selatan)

Judul tersebut telah disetujui oleh Jurusan yang bersangkutan pada tanggat 3 1 Maret 201 I , abstraksi/orztline _{tetlampir. Saudara dapai melakukan-perubahan redilsional pada judul} tersebut. _{Apabila perubahan}_substansial_dianggap_{perlu, mohon pembimbing menghubungi} Jurusan terlebih dahulu.

B.imbingan skripsi ini diharapkan selesai dalam waktu 6 (enam) bulan, dan dapat diperpanjang selama _{6 (enam) buian berikutnya}_{tanpa surat perpanjangan.}

Atas perhatian _{dan kerja sama Saudara,}_{kami ucapkan}_{terima kasih.} Wass _{alamu' alaikum wr.wb.}

lnggris As s alamu' alaikum wr.w b.

Dengan ini diharapkan kesediaan Saudara untuk menjadi pembimbing I/II (materi/teknis) _penulisan_{skripsi mahasiswa:}

Andrian Dwi Prayoga 1 070 I 4000882

Pendidikan Bahasa Inggris VIiI

Tembusan: l. Dekan FITK

2. Mahasiswa ybs.

ki, M.Pd 1

r 2 t 2 199103

(5)

DEPARTEMEN

AGAMA

UIN JAKARTA

FITK

Jl. lr. H. Juanda No 95 Ciputat 15412 lndonesia

FORM (FR)

No. Dokumen : FITK-FR-AKD-082 Tgl. Terbit : 1 Maret 2010

No. Revisi: : 01

H a l 1t1

SURAT PERMOHONAN

IZIN PENELITIAN

Nomor

: Un.01/F.

1 /KM.01

.3/$s.lq?201

1 Lamp. : Outline/Proposal

Hal :Permohonan

lzin Penelitian

Kepada

Yth.

Kepala

SMP Negeri

13 Tangerang

Selatan

DiTempat,

A,ssal

am u' alai

ku m wr.wb.

Dengan

hormat

kami sampaikan

bahwa,

Jakarta,

25 April2011

:Andrian

Dwi Prayoga

: 107014000882

: Pendidikan

Bahasa

Inggris

S e m e s t e r : V l l l ( D e l a p a n )

Judul

Skripsi :

.,AN

_ANALYSIS

_{ON THE DIFFICULTY}

_LEVEL

_{OF ENGLISH}

_SUMMATIVE

_TEST

FOR

SECOND

GRADE

OF JUNIOR

HIGH

SCHOOL

AT ODD

SEMESTER

201012011"

(A Gase

Study at the Second

Grade

of SMP Negeri

13 Tangerang

Selatan)

adalah

benar

mahasiswa/i

Fakultas

llmu Tarbiyah

dan Keguruan

UIN Jakarta

yang

sedang menyusun skripsi, dan akan mengadakan

penelitian (riset) di

instansiisekolah/madrasah

yang

Saudara

pimpin.

Untuk itu kanri mohcn Saudara dapat mengizinkan

mahasiswa

tersebut

melaksanakan

penelitian

dimaksud.

Atas perhatian

dan kerja

sama

Saudara,

kami

ucapkan

terima

kasih.

Wassal

am

u' al ai kum wr.wb.

a.n. Dekan

Kajg,r

ikan Bahasa Inggris Nama

N I M Jurusan

Tembusan: 1. Dekan FITK

2. Pembantu Dekan Bidang Akademik 3. Mahasiswa yang bersangkutan

' !

g Drs.

NjP.

(6)

PE,MERINTAH KOTA TANGERANG SELATAN

\

DIN^A.S FENDIDIKAN

SMP

NEGERI

T3

KOTA

T'ANGERANG

SH'T,ATAN

Jl. Beruang II Feladen Pd. Ranji ciputat Timur Tangerang selatan t54t2,Telp'/Fax" 021'7354472

Website.www.smpnSciputat.cornE.rnailsmpnl3_tangsel@yahoo""91-SURAT KETtrBA}{GAN

No.: 423.4 I 965 ISMPN13TANGSEL 12011

Yang bertitnda tangan di bawali ini:

Rohman, S.Pd.

1 9 5 8 0 8 1 1 1 9 8 0 0 3 1 0 1 2 Pembina/IV a

Kepala SMP Negeri 13 Tangerang Selatirn Nama

NIP

Pangkat/Golongan Jabatan

Menerangkan bahwa" Nama

NIM Jurusan Ftrkultas

Semester

Andriari Dwi PraYoga 107014000882

Pendidikan Bahasa Inggris

Ilmu Tarbiyah dan Keguilan, UIN Jakarta VIII (Delapan)

Judul Penelitian :

uANANALTslsoNTHEDIFFICULTYLEryL,FENGLISHSUMlvaTIw

TEST FOR SECOND GRADE OF XUNIOR IIIGII SCHOOL AT ODD SEMESTER 2010/2011

(A CaseStudy ut tlte SeeandGratle of SMP Negeri 13 Tangerang Sclonn)"

Nama tersebut di atas b,enar telah melakukan penelitian di SMF Negeri 13 Tangerang Selatan padatanggalz3 - 30 Mei 2011'