OUT LINE
CHAPTER I INTRODUCTION
A. Background of Study
B. Significance of the Study
C. Limitation of Problem
D. Formulation of Problem
E. Research Methodology
F. Organization of Writing
CHAPTER II THEORETICAL FRAMEWORK
A. Evaluation
B. The definition of test
C. Testing roole
D. Types of test
a. Function
1. The placement test
2. The diagnostic test
3. The achievement test
4. The proficiency test
b. Way of scoring
1. Objective test 2. Subjective test
E. The Characteristic of a Good Test
1. Validity
2. Reliability
3. Practically
F. Item Analysis
1. Level of difficulty
2. Discriminating power
3. Distracter
G. The importance of item analysis
CHAPTER III PROFILE OF SCHOOL
A. History of School
B. Vision and Mision of School
C. Facilities of School
D. Organization Structure of School
E. Teachers, Staffs, and Students
CHAPTER IV RESEARCH FINDINGS
A. Population and Sample
B. Time of Research
C. The Data Description
D. The Data Analysis
E. The Data Interpretation
CHAPTER V CONCLUSION AND SUGGESTION
A. Conclusion
B. Suggestion
CHAPTER I INTRODUCTION
A. Background of Study
Evaluation is an important part of every teaching and learning experiences. It gives big contribution for the teaching and it provides an information about
the students’ progress which can be used by the teachers to manage the learning task and students. As stated by Pauline Rea- Dicksin and Kevin
Germain; “Evaluation is important for the teacher because it provides a wealth of information to use for the future direction of classroom practice, for the
planning of courses and for the management of learning tasks and students.”
1
Evaluation also can be said as the process to make desirable decision toward teaching and learning based on the information that has been collected,
synthesized, and reflected on. Lyle F. Bachman states “Evaluation can be defined as the systematic gathering of information for the purpose of making
decision”.
2
Depending upon the decision being made and the information a teacher needs in order to inform that decision, testing often contribute to the process
as the implementation of evaluation. Indeed, a test is one kind of evaluation instrument to collect data. “A test is defined as a systematic procedure for
observing and describing one or more characteristics of a person with the aid of either a numerical scale or category system”.
3
In other word, a test measures a person’s ability or knowledge with a number of tasks or questions.
According to Henning “. . . tests in general is to pinpoint strengths and
1
Pauline Rea and Kevin Germain, Evaluation, New York: Oxford University Press, 1992, p. 3
2
Lyle F. Bachman, Fundamental Considerations in Language Testing, Oxford; oxford University Press, 1990, p. 22
3
Anthony J . Nitko, Educational Test and Measurement, An Introduction, New York: Harcourt Brace Javanovich, Inc, 1983, p.6
1
2
weakness in the learned abilities of students”.
4
Teachers need to do the test because through the test they are able to find out the students’ achievement in
mastering the lessons that have been taught and to evaluate the effectiveness of the method used and the teaching material. Rebecca M. Valette states,
“…through tests the teacher can evaluate the effectiveness of a new teaching method, of a different approach to a difficult pattern, or of a new materials”.
5
To measure the students’ learning progress at school, a teacher commonly administers two kinds of test; formative test and summative test.
The former test is held earlier than latter test which is held at the end of semester. Through both tests, a teacher can measure the students’ achievement
level and the degree of how far the instructional objectives of learning be accomplished by them. For this reason, Gronlund states that;
“Formative test is used to monitor learning progress during instruction. Its purpose to provide continuous feedback to both pupil and teacher
concerning learning successes and failures ………..And summative test typically comes at the end of a course of instruction. It is designed to
determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or for
certifying pupil mastery of the intended learning outcomes”.
6
For getting accurate measures a test must have a good quality, because a good test doesn’t only influence the students learning, but also influences the
teachers to improve teaching and learning process. JB. Heaton supports that “Test may be constructed primary as device to reinforce learning and to
motivate the students’ performance in language”
7
. In addition, Lyle F. Bachman states also that “Test are often used for pedagogical purposes, either
4
Grant Henning, A Guide to Language Testing, U.S.A: Newbury House Publishers, 1987, p. 1
5
Rebecca M. Valette, Modern Language Testing, U.S.A; Harcourt Brace Javanovich, 1977, p. 5
6
Norman E. Gronlund, Measurement and Evaluation in Teaching 4
th
edition, Macmillan; Publishing Company, 1976, p. 18
7
JB. Heaton, Writing English Language Test, New Delhi; Tata Mc. Graw-Hill Publishing Company, 1998, p.13
3
as a means of motivating students to study or as means of reviewing material taught”.
8
As the accuracy of a test result influences the motivation of students learning, so the test administered must reflect a good test. A good test is a test
which has the criteria of validity, reliability, and practically. Beside that, it must has discriminating power and difficulty level.
9
A test can be valid if the test can measure what is supposed to measure. It can be reliable if the result
of the test is the same even though the test administered to the same level students in the next time. And it can be practical if it is easy to do and
administer. The matter, which is often forgotten by the teacher is the follow up of
the test implementation pertaining to the test item it self. In fact, they do not criticize whether or not all items have fulfilled the criteria above. Therefore, it
really required an analysis of the test items, that is namely “item analysis”. Through analyzing test item teacher can identify good item and the poor item
and to differentiate between student who have done well and poorly. According to J. Stanley Ahmann and Marvin D. Glock, the purpose of doing
item analysis is: “Re-examining each test item to discover its strengths and flaws is known
as item analysis. Item analysis usually concentrates on two vital features; level of difficulty and discriminating power. The former means the
percentage of pupils who answer correctly each item; the latter the ability of the test item to differentiate between pupils who have done
well and those who have done poorly”.
10
8
Lyle F. Bachman, Fundamental Consideration in Language Testing, Toronto; Oxford University Press, 1990, p. 22
9
JB. Heaton, Writing English Language Test, New Delhi; Tata Mc. Graw-Hill Publishing Company, 1998, p. 152-156
10
J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth, Principles of Tests and Measurements, Boston: Allyn and Bason, INC, 1967, p. 184
4
In addition Ngalim Purwanto states; “Tujuan Khusus dari item analisis adalah mencari soal tes mana yang baik dan mana yang tidak baik, dan mengapa item
atau soal itu dikatakan baik dan tidak baik.”
11
The latest English summative test at MTs.Darul Ma’arif was held on June 19, 2009. According to pre-survey result during teaching practice at Mts.
Darul Ma’arif , the writer was informed that in the occasion of second semester, the English teacher has never analyzed the test items, so that is
difficult to say whether it is a good test or not. In addition, the test results show that the scores of the students’ are bad.
Considering this fact, the writer is interested in making item analysis through the items of English summative test at MTs. Darul Ma’arif
Jakarta, in the second term 20082009 academic year.
B. Limitation of the Problem
The writer limits the study of item analysis of the English summative test which is administered for the second year of MTs. Darul Maa’rif Jakarta
20082009 academic year on the aspect of difficulty level or facility value.
C. Formulation of Problem
Based on the background of study described, the writer would like to seek the answer the following problem; “Does the English summative test
items for the second year students of MTs. Darul Ma’arif Jakarta have a good quality in terms of difficulty level?”
11
Drs. Ngalim Purwanto, Prinsip-prinsip dan Tehnik Evaluasi Pengajaran, Bandung; Remaja Rosda Karya, 1991, p. 118
5
D. Significance of the Study
Firstly, it provides with the feedback to the writer especially, and the English teacher of how to analyze the test items in terms of difficulty level.
Secondly, it informs the English teacher about the quality of test items in terms of difficulty level. Through this research, the English teacher
can know the good items for the future used and the students’ achievement in mastering the materials taught in order to evaluate the teacher’s competence in
teaching.
E. Organization of Writing
In discussing the topic, the writer divides this study into five chapters, as follow
Chapter one is introduction, involving background of study, significance of study, limitation of problem, formulation of problem,
significance of study and organization of writing. Chapter two is theoretical framework which discusses about
evaluation, the test and its types, the criteria of a good test and item analysis Chapter three discusses is research methodology which is include the
objective of research, the method of study, the time and place, the population and sample, the instrument and the procedure of the research.
Chapter four presents the research findings which consist of the data description and the data analysis.
Chapter five is devoted to the conclusion of what has been discussed and analyzed in the chapters before, and also the writer’s suggestion through
the research.
CHAPTER II THEORETICAL FRAMEWORK
A. The Definition of Evaluation
Evaluation is important for every process of anything that has done, because through evaluation we can find out the weakness which should be
revised and the strengths which should be improved, so does in the teaching learning process evaluation plays important role to contribute and provide
some information for making judgments about what is good or desirable as in order to improve the students’ knowledge in learning and the teacher’s
competence in teaching,. It is likely what Peter W. Airasian defines: “Evaluation is the process of judging the quality or value of a performance or
a course of action”.
1
Still in the same sense Lyle F. Bachman states “Evaluation can be defined as the systematic gathering of information for the
purpose of making decision”.
2
And evaluation includes, “the making judgments about the value, for some purpose, of ideas works, solutions,
methods, materials, etc”.
3
Hence, Benjamin S. Bloom,et.al states that “Evaluation is a system of quality control in which It may be determined at
each step in the teaching-learning process whether the process is effective or not, and if not what changes must be made to its effectiveness before it is too
late”.
4
Basically, the purpose of evaluation is to judge the worth of program or procedure, usually in terms of how well it has achieved its objectives and
1
Peter W. Airasian, Classroom Assesment; Concepts and Applications, 1221 Avenue of the Americas, New York, NY 10020; McGraw-Hill, 2005}, 5
th
edition, p. 9
2
Lyle F. Bachman, Fundamental Confiderations..., p. 22
3
Julian C. Stanley, Measurement In Todays’ School, Englewood Cliffs; Prentice-Hill, Inc, 1964, p. 16
4
Benjamin S.Bloom, Handbook on Formative and Summative of Students Learning, London; Longman, 1971, p. 8
6
7
for this purpose all appropriate techniques of gathering evidence may be used.
5
“Evaluation goes beyond the statement of how much to concern it self with the question what value. It seeks to answer the pupil’s and teacher
question of what progress am I making???.
6
Richard I. Arends states that “ An important purpose of testing and evaluation is to provide students with
feedback on how they are doing”.
7
Finally, considering all those opinions above about evaluation, the writer can summarize that evaluation is a systematic process to provide
available information in order to make judgment and desirable decision of how to measure whether the objective is suitable or in line of the curriculum
used, and to find out the students’ improvement in teaching learning process and the teacher competences in teaching, and also the classroom climate.
B. The Definition of Test
When people hear the word assessment and evaluation, they often think right a way of tests because a test is one of the instruments of evaluation
for collecting the data. A test is a formal, systematic, usually paper-and-pencil procedure for gathering information about pupil’s performance.
8
While paper- and-pencil tests are one important tool for gathering assessment information.
A test is composed of a number of tasks or questions for students to respond. By analyzing the responses, the teacher can measure the student’s
achievement in the teaching learning process. While Lyle F. Bachman states that; “A test is a procedure designed to elicit certain behavior from which one
can make inferences about certain characteristics of an individual”.
9
While
5
Victor H. Noll, Introduction to Educational Measurement, Boston; Houghton Mifflin Company, 1965, 2
nd
edition, p.14
6
H.H.Remmers, N.L.Gage, J.Francis Rummel, A Practical Introduction to Measurement and Evaluation
, USA; Harper and Brother Publishers, 1960, p. 7
7
Richard I. Arends, Learning To Teach, New York, Mc.GrawHill International Edtion, 1989, p. 312
8
Peter W. Airasian, Classroom Assesment..., p. 9
9
Lyle F. Bachman, Fundamental Consideratin..., p. 20
8
Wilmar states that; “A test is a set of questions, each of which has a correct answer, that examinees usually answer orally or in writing”.
10
From those views of test, it can be concluded that a test can be instrument, techniques, or procedures to have the students’ respond through
tasks or performance in the form of set of questions must be answered in order to achieve the teaching-learning objectives. In short, a test is a measurement
instrument designed to assess a specific sample of individuals’ behavior. Test is also a way to deliver information, which is very useful for
many practitioners of education. “A test is a formal systematic procedure for gathering information”.
11
Therefore, test a device of educational is necessary in a teaching process, since testing and teaching can not be separated. Heaton
states that ”both testing and teaching are so closely interrelated that is virtually impossible to work in either field without being constantly concerned with the
other”.
12
The reason of that interrelation and connection between testing and teaching is the material tested, must be based on the material taught in order to
find out how far the students comprehension.
C. Type of Tests
There are many kinds of tests used to measure students’ achievement that can be used in an evaluation process. The type of test can be classified
into two types, namely; function and way of scoring. 1.
Function According to Andrew Harrison, the types of functional test can be
categorized into four types: placement test, diagnostic test, achievement test, and proficiency test.
10
Wilmar Tinambuan, Evaluation of Students Achievement, Jakarta; Depdikbud, 1988 p. 310
11
Julian C. Stanley, Measurement in Today’s..., p.3
12
J.B. Heaton, Writing English..., p.1
9
a. The Placement test
Placement test is used to place a student to appropriate level or section of a language curriculum or school. It usually happens in the
beginning of course. According to Wilmar Tinambuan; A placement test is designed to determine pupil performance at the
beginning of instruction. Thus, it is designed to sort new students into teaching groups, so that they can start a course at
approximately the same level as the other students in the class. It is concerned with the student’s present standing, and so relates to
general ability rather than specific points of learning. As a rule the result are needed quickly so that the teaching may begin.
13
b. The Diagnostic Test
Diagnostic test is designed to diagnose a particular aspect of a language. “Diagnostic tests are also achievement test, but they are
characterized by one distinctive feature, namely that they are designed to show specific weakness and strengths within the skills or elements
measured”.
14
It can also be used to check the students’ progress in learning particular elements of the course. It is used for example at the end of a
unit in the course book or after lesson designed to teach one particular point.
15
“A diagnostic test is designed to determine the degree to which the specific instructional objectives of the course have been
accomplished”.
16
And J.B Heaton states that; “Diagnostic test is widely used, few tests are constructed solely as diagnostic tests. Note
that diagnostic testing is frequently carried out of groups of students rather for individuals”.
17
13
Wilmar Tinambuan, Evaluation of Students..., p. 7
14
Robert Lado, Language Testing, Hongkong; Wing Tai Cheung Printing Co Ltd, 1961, p. 369
15
Andrew Harrison, A Language Testing Handbook, London; Macmillan Press, 1983, p.6
16
James Dean Brown, Testing in Language Program, New Jersey; Prentice Hall Regents, 1996, p. 15
17
J.B. Heaton, Writing English... , p.173
10
Thus, diagnostic test is much comprehensive and detailed because it searches for the underlying causes of learning difficulties
and then formulates a plan for remedial action. c.
Achievement Test These tests are used to know what students have actually learnt
or on what have actually been taught. “Achievement tests are designed to measure relative accomplishment in specified areas of work”.
18
The purpose of achievement test as its name reflect is to establish how
successful individual students, groups of students, or the courses themselves have been in achieving objectives.
19
In another point of view Wilmar says that “the degree purpose of achievement test is
designed to indicate degree of students’ success in some past learning activities”.
20
And also “Achievement tests relate to the past in that they measure, what language the students have learned as a result of
teaching”.
21
Based on the argumentation above about achievement test, the writer can conclude that the achievement test are intended to measure
how effectively students have mastered the lesson and how far they have reached the instructional objectives. Thus, an achievement test
must be designed with very specific reference to a particular course. This link with a specific program usually means that the achievement
tests will be directly based on the course objectives and will therefore be criterion referenced. Such tests will typically be administered at the
end of a course to determine how effectively students have mastered the instructional objectives.
At the implementation level, the achievement test appears in two purposeful tests, they are formative test and summative test.
18
H.H. Remers, NL. Gage, J. Fraancis Rummel, A Practical Introduction..., p. 19
19
Arthur Hughes, Testing for Language Teachers, Cambridge; Cambride University Press, 2003, p. 13
20
Wilmar Tinambunan, Evaluation of Students..., p. 19
21
Tim Mc Namara, Language Testing, Hong Kong: Oxford university Press, 2000, p. 7
11
1 Formative test
Formative test is administered by the teacher during the learning progress with the aim of using the result to improve
instruction and to provide continuous feedback to both students and teacher. Rebecca M. Valette states “The formative test is given
during the course of instruction; its purpose is to show which aspects of the chapter the student has mastered and where remedial
work is necessary”.
22
Hence, formative test is part of the
instructional process. When incorporated into classroom practice, it provides the information needed to adjust teaching and learning
while they are happening. In this sense, formative test informs both teachers and students about student understanding at a point when
timely adjustments can be made. These adjustments help to ensure students achieve, targeted standards-based learning goals within a
set time frame.
23
2 Summative test
Summative test is a test that usually administered at the end of the course. Rebecca M. Valette states ”the summative test, on the
other hand, is usually gives at the end of a marking period and measures the “sum” total of the material covered. On this type of a
test, students are usually ranked and graded”. Moreover, summative test is given periodically to determine at a particular
point in time what students know and do not know. Summative test at the districtclassroom level is an accountability measure that is
generally used as part of the grading process. Arthur Hughes states that”the content of summative test should be based directly on a
detailed course syllabus or on the books and other material used”.
24
22
Rebecca M. Valette, Modern Language..., p.6
23
http:www.nmsa.orgPublicationsWebExclusiveAssessmenttabid1120Default.aspx
24
Arthur Hughes, Testing for Language…, p. 11
12
Finally, the writer can conclude that summative test is a test that usually administered at the end of a course of study.
d. Proficiency Test
The proficiency test is also used to measure what students have learned, but the aim of the proficiency test is to determine whether this
language ability corresponds to specific language requirements”.
25
According to J.B. Heaton that “the proficiency test is concerned simply with measuring a student’s control of the language in the light
of what he or she will be expected to do with it in the future performance of a particular task “.
26
And also James Dean Brown states that: “A proficiency-test assess the general knowledge or skill
commonly required or prerequisite to entry into or exemption from a group of similar institution”.
27
Then, it should never be undertaken lightly. Instead, these decisions must be based on the best obtainable proficient test scores as
well as other information about the student. The content of proficiency test therefore, is not based on the content of objective of language
courses that people taking the test may have followed. Rather, it based on a specification of what candidates may have to be able to do in
language, in order to be considered proficient”.
28
25
Rebecca M. Valette, Modern Language..., p.6
26
J.B. Heataon, Writing English... , p.172
27
James Dean Brown, Testing In Language..., p.10
28
Arthur Hughes, Testing For Language..., p. 9
13
2. Way of Scoring.
Based on the manner of scoring, the type of test item is divided into two general types: objective and subjective test. J.B. Heaton states
that “Subjective and objective test are terms used to refer to the scoring of tests”.
29
a. Objective test
An objective test item is any test item that there is only a single correct answer. In this test, the students must select one option from
some alternatives. According to Valette; “An objective test item is any item for which there is a single predictable correct answer”.
30
Hence, this item type referred as objective test item, because they can be scored objectively. That is, equally competent scorers can
score them independently and obtain the same result. Therefore, whether the item is scored by one teacher or another, today or last
week, it will yield the same score. That is, the advantages of the objective test items are objective scoring, that is quick, easy and
consistent. The objective test item commonly used in classroom testing are
true-false, multiple-choice, matching, and short answers. “These test item include all of the selection-type items-multiple choice, true false,
and matching.”
31
1 True-False
True-false is simply a declarative statement which the students must judge as true or false. As what J. Stanley explained
that “true-false item is referred to alternative response item; the
29
J.B. Heaton, Writing English..., p. 25
30
Rebecca M. Valette, Modern Language..., p.6
31
Norman E. Gronlund, Constructing Achievement Tests, New Jersey: Prentice-Hall., Inc., 1968, p. 25
14
item asks the students to answer with the “true” if it conforms to the truth or “false” if it essentially incorrect.
32
Thus, the item provides the students with a choice of two alternatives, so the students have possibility to guess the answer
and sometimes it will be the right answer. In other word, students indicate whether a statement is true or false.
Example: T F True-False items classified as supply-type item
2 Multiple-choice item
The multiple-choice item consists of a stem, which presents a problem situation, and several alternatives, which provide
possible solutions to the problem. The stem may be a question or an incomplete statement. The alternatives include the correct
answer and several plausible wrong answers, called distracters. Their function is to distract those students who are uncertain of the
answer. “A multiple-choice item consists of one or more introductory sentences followed by a list of two or more suggested
responses from which the examinee chooses one as the correct answer”.
33
Example: In objective testing, the term objective refers to the method of …
a. identifying the learning outcomes
b. selecting the test content
c. presenting the problem
d. scoring the answers
3 Matching
The matching test item consists of two parallel columns with each word. Number of symbol in one column is being
matched to a word, sentence or phrase in other column. This type
32
J. Stanley Ahman and Marvin D. Glock, Evaluating Pupil Growth..., p. 17
33
Anthony J. Nitko. Educational Test..., p. 190
15
of item is employed widely in situation where relationship of more or less similar ideas, facts and principles are to be examined or
judged. In this type, students indicate relationship between a set of premises and a set of responses.
Example: 1. The …. drives a car a. doctor
2. The …. checks the patience b. driver
This kind of test is an effective way to student’s recognition of the relationships between words, definitions, events, dates,
categories, examples, and so on.
b. Subjective Test item
Subjective test is a test where in its scoring requires judgment and evaluation of scores. While Vallette states that “Subjective item is
one that does not have a single right answer”.
34
It means that the scoring is inconsistent and the answer of the question is in form of
composition where the students are given a chance to relate their idea or argument in their own words. In other word, the answer is
commonly in a form of composition or statement. “Subjective tests, like translation and essay, have the advantage of measuring language
skill naturally, almost the way English used in a real life”.
35
The subjective tests that are commonly used in classroom are completion, short-answer, and essay item.
1 Completion
The completion item is a written statement that requires the examinee to supply the correct word or short phrase in response to
an incomplete sentence, a question or a word association.
34
Rebecca M. Valette. Modern Language..., .p. 10
35
Harold S. Madsen, Technique In Testing, Oxford; Oxford University Press, 1983 p.8
16
Completion test can be used effectively to measure the recall of terms, dates, and names.
36
The completion item and short answer item are both supply type test items, but in the short answer type, the blank is nearly
always at the end, whereas in the completion, type of the blank may occur everywhere in the statement.
37
2 Short- answer Item
The short answer item consists of a question, which can be answered with a word or short phrase.
38
A student provides a short response to a direct question or direction.
Generally, teachers prefer to use the short answer type question, probably because they think it has some advantages. It is
relatively easy to construct, it also gives the teacher some opportunity to see how well students can express their thought and
it is also not difficult to score or mark than the essay question.
39
However, it is difficult to phrase the short answer question, so that only one answer is correct. And this type of question will be more
useful only in testing knowledge of facts and quite specific information.
3 Essay test.
The most notable characteristic of the essay test is freedom of response it provides. The student is asked a question which
requires him to produce his own answer. He is relatively free to decide how to approach the problem, what factual information to
use, how to organize his reply, and what degree of emphasis to give each aspect of the answer. Thus, the essay question places a
36
Wilmar Tinambuan, Evaluation of Students..., p. 61
37
Victor H. Noll, Introduction to Educational..., p. 140
38
Victor H. Nol, Introduction to Educational..., p. 138
39
Victor H. Nol, Introduction to Educational..., p. 138
17
premium on the ability to produce, integrate, and express the ideas. As what Norman E Gronlund states that;
“Essay tests are inefficient for measuring knowledge outcomes . . . but they provide a freedom of response which is needed for
measuring certain complex outcomes . . . . These include the ability to create . . . . to organize . . . . to integrate . . . . to express . . . and
similar behaviors that call for the production and synthesis of ideas”.
40
Finally, from the explanation above about both objective test and subjective test concerned on the essay test, the writer conclude that for the
measurement of most knowledge outcomes we would use objective test items to take advantage of their more extensive sampling and greater reliability. For
the measurement of such complex learning outcomes as the ability to create, organize, and evaluate ideas, however, the teacher would use essay questions
despite their limitation. Of the types of test item above, the writer will concern only with the
multiple choice test item in English summative test for the second year students of Mts. Darul Ma’arif, administered at the end of the second semester
2008209 academic year.
D. Criteria Of A Good Test
There are many considerations entering into the evaluation of a test, which referred as a good test because a good test can provide available
information for a good evaluation in order to measure student’s comprehension of the instructional objectives, but the writer consider them
under three main headings;. These are respectively validity, reliability, and practically. Validity refers to the extent to which a test measures what we
actually wish to measure. According to Brown “Validity is the degree to which the test actually measures what is intended to measure…..Reliability is
40
Norman E. Gronlund, op Constructing Achievement..., p. 65
18
consistent and dependable…….And practically is means of financial limitations, time constraints, ease of administration, and scoring and
interpretation”.
41
1. Validity
The single most important characteristic of a good test is its ability to help the teacher make a correct decision of what is intended to measure.
This characteristic is called validity. “Validity is concerned with whether the information being gathered is relevant to the decision that needs to be
made”.
42
A test has validity if it measures appropriately, what it is supposed to measure. According to Heaton: “The validity of a test is the extent to
which it measures what is to measure and nothing else”.
43
Finnochiaro and Sako also state : “A test is valid when it measures effectively what it is
intended to measure”.
44
Still in the same sense, Wilmar states that “The validity of a test is the extent to which the test measures what is intended
to measure”.
45
Also, Norman E. Gronlund states that “test scores are valid to the extent to which they serve the use for which they are intended”.
46
While J. Staley Ahmann and Marvin D. Glock point out “In educational measurement, validity is often defined as the degree to which a measuring
actually serves the purposes for which it is intended”.
47
Based on the definition, the writer can conclude that validity of test is important to know whether a test has a good quality in testing
someone’s capacity.
41
H. Douglas Brown, Teaching by Principles An Interactive Approach to Language Pedagogy, San Fransisco: Longman, 2
nd
edition, p. 386-387
42
Peter W. Airasian, Classroom Assesment..., p. 16
43
J.B Heaton. Writing English... , p. 159
44
Mary Finocchiaro and Sydney Sako, Foreign Language Testing a Practical Approach, New York: Regent publishing company, 1983, p. 24
45
Wilmar Tinambunan, Evaluation of student..., p. 11
46
Norman E. Gronlund, Constructing Achievement..., p. 105
47
J. Stanley Ahamnn and Marvin D Glock, , Evaluating Pupil Growth..., p. 285
19
As the validity is one of the most important characteristic of test scores, the constructor of the test should know the various aspects from the
validity itself and various procedures by which they are determined. “The two most important characteristics of test scores are validity
and reliability…Anyone working with tests-whether constructing them or using published tests-should understand the meaning of
these concepts…and should know the various procedures by which they are determined”.
48
According to Heaton, a validity of a test can be seen from some aspects mentioned below.
a. Face validity
A test has face validity if the test has a good “face” or the way the test looks. According to Heaton: “if a test items looks right to other
testers, teachers, moderators, and testers, it can be described as having at least face validity”.
49
While Marry Finocchiario and Sydney Sako define it is “A judgment about a test based on the way the test looks to
educators, students, and the general public. The test should not only ‘be right’ it also ‘look right”.
50
b. Content Validity
A test has content validity if the test contains materials that the student has been taught. To fulfill this, the teacher also should refer to
the instructional objectives of the teaching learning process. Finocchiario and Sako state; “Content validity is assured by checking
all items in the test to make certain that they correspond to the instructional objective of the course“.
51
Still in the same sense, Victor H. Noll explaines “when a teacher gives a test which deals with the
48
Norman E. Gronlund, Constructing Achievement..., p. 105
49
J.B Heaton, Writing English..., p. 159
50
Marry Finochiario and Sydney Sako, Foreign Language..., p. 28
51
Marry Finnochiaro and Sydney Sako, Foreign Language..., p. 25
20
material and with the objectives of instruction in particular class, his test is said to have curricular content validity”.
52
c. Construct Validity
A test is said to have a construct validity if it can demonstrates that it measures just the ability, which it is supposed to measure
.according to Heaton; “if a test has construct validity, it is capable of measuring certain specific characteristics in accordance with a theory
of language behavior and learning”.
53
d. Empirical Validity
A fourth type of validity is usually referred to as statistical or empirical validity. This validity is obtained as a result of comparing
the result of the test with the result of some criterion measure.
54
2. Reliability
The second criterion of a good test is reliability. Reliability has to do with the accuracy and precision of a measurement procedure. Indices of
reliability give an indication of the extent to which a particular measurement is consistent and reproducible.
55
A test should be reliable as a measuring instrument.
According to Finocchiario and Sako; the reliability or stability of a language test is concerned with the degree to which it can be trusted to
produce the same result upon repeated administration to the same individual, or to give consistent information about the value of a learning
variable being measured”.
56
While J. Stanley Ahmann and Marvin D. Glock state that “Reliability means consistency of results. This is
equivalent to saying that a highly reliable instrument can be used
52
Victor H. Noll, Introduction to Educational..., p. 79
53
J.B. Heaton. Writing English..., p. 161
54
J.B. Heaton, Writing English..., p. 161
55
Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology and Education
, London; John Willey and Sons, Inc., 1961, p. 127
56
Marry Finnochiario and Sydney Sako, Foreign Language..., p. 28
21
repeatedly in an unchanging situation and produce constant or near constant results.”
57
Based on above statements a test is reliable if it consistently yields the same or nearly the same ranks over repeated administrations.
3. Practicality
Practicality is concerned with a wide range of factors economy, convenience and interpretability that determine whether a test is practical
for widespread use. “Practically is concerned with a wide range of factors economy, convenience, and interpretability that determine whether a test is
practical for widespread use”.
58
A test maybe a highly reliable and valid instrument but still is beyond our means facilities. The teacher or someone who makes the test
should keep in mind a number of very practical considerations. There are many factors of practicality; economy, scorability, and administrability.
According to Finnochiario and Sako state that “the criteria for practicality normally will be based upon such factors as economy,
scorability, and administrability”.
59
While, Harrison states that “tests should be as economical as possible in time preparation, sitting, and
marking and in cost material and hidden costs of time spent”.
60
In short, the criteria of a good test are validity, reliability and
practicality. However, besides those three criteria, a good test as whole is also determined by the quality of each item that construct the set test. If the
quality of each item is good, it can give the strength and accuracy of the scores get from the test. Then, the quality of each item individually can be
analyzed by doing item analysis. According to Robert Lado; “item analysis is the study of validity, reliability, and difficulty of test item taken
57
J. Stanley Ahmann and Marvin D. Glock, , Evaluating Pupil..., p. 311
58
Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation..., p. 127
59
Marry Finnochiario, Foreign Language Testing..., p. 30
60
Andrew Harrison, A Language Testing..., p. 13
22
individually as if they were separate tests”.
61
through this analysis, the evaluator can get information about which item is good for the future used.
D. Item Analysis
After a test has been administered and scored it is usually desirable to evaluate the effectiveness of the items. This is done by studying the students’
responses to each item. When formalized, the procedure is called item analysis. Anthony J. Nitko states, “item analysis refers to the process of
collecting, summarizing, and using information about pupils’ responses to items”.
62
Meanwhile Harold S. Madsen explained that:
“The selection of appropriate language items is not enough by it self to ensure a good test. Each questions needs to function properly; otherwise, it
can weaken the exam. Fortunately, there are some rather some simple statistical ways of checking individual item. This procedure is called ‘item
analysis’.”
63
An item analysis also is a systematic procedure which provides some information about the quality of the test item, concerning each of the
following points: 1.
The difficulty of the item 2.
The discriminating power of the item 3.
The effectiveness of each alternatives or distracters.
Thus, item analysis information can tell the evaluator or constructor if
an item was too easy or too hard, how well it discriminated between high and low scorers on the test, and whether all of the alternatives functioned as
intended. According to Suharsimi Arikunto, “Analisis soal antara lain bertujuan untuk membantu kita dalam mengidentifikasi butir-butir soal yang
jelek, memperoleh informasi yang akan digunakan untuk menyempurnakan
61
Robert Lado, Ph. D, Language..., p. 342
62
Anthony J. Nitko, Educational Test..., p. 342.
63
Harold S. Madsen, Technique In... , p. 180
23
soal-soal untuk kepentingan lebih lanjut, dan untuk memperoleh gambaran secara selintas tentang keadaan yang kita susun”.
64
Item analysis data also aids in detecting specific technical flaws and thus further provides information for improving test items, as what J. Stanley
Ahmann and Marvin D. Glock state “item analysis is re-examining each test to discover its strength and flaws”.
65
Item analysis has several benefits. First, it provides useful information for class discussion of test. Second, it provides data for helping the students
improve their learning. Third, it provides insights and skills which lead to the preparation of better tests on future occasions.
66
Finally, the writer concludes that item analysis is very important to do in order to get information of the quality of the test item, whether it is good
item or poor item.
1. Difficulty Level of The Item
The difficulty level of item means the percentage of pupils who answer correctly each test item. “The item difficulty is fraction of the
persons taking an item who answer it correctly”.
67
Heaton states that “The index of difficulty “of facility value of an item simply shows how easy or
difficult the particular item provide in the test. The index of difficulty facility value is generally expressed as the fraction percentage of the
students who answered the item correctly”.
68
A good test item should have a certain degree of difficulty. It may not be too easy or too difficult because the test that is too easy or too
difficult will yield same score distribution that make it hard to identify
64
Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, Jakarta; Bina Aksara, 1987, p. 205
65
J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth..., p. 184
66
Norman E. Gronlund, Constructing Achievement..., p. 85-86.
67
Anthony J. Nitko, Educational Test..., p. 288
68
J.B Heaton, Writing English..., p. 178
24
reliable differences in achievement between the pupils who have done well and these who have done poorly. Suharsimi Arikunto says;
”Soal yang baik adalah soal yang tidak terlalu mudah atau tidak terlalu sukar. Soal yang terllau mudah tidak merangsang siswa untuk
mempertinggi usaha siswaq untuk memecahkannya. Soal yang terlalu sukar akan menyebabkan siswa
menjadi putus asa dan tidak mempunyai semangat untuk mencoba lagi karena diluar jangkauannya”.
69
By analyzing the students’ response to the items, the level of difficulty of each item can be known and the information will be helpful
for teacher in identifying concepts to re-teach the study material. In addition, by analyzing the facility value, the teacher will know if the item
is easy, moderate, or difficult, M. Chobib Thoha states; “item yang baik adalah item yang tingkat kesukarannya dapat diketahui
tidak terlalu sukar dan tidak terlalu mudah. Sebab tingkat kesukaran itu memiliki korelasi dengan daya pembeda. Bilamana item memiliki tingkat
kesukaran maksimal, maka daya pembedanya akan rendah, demikia pula bila item itu terlalu mudah juga tidak akan memiliki daya pembeda”.
70
To measure the difficulty level of each item, the writer uses the Heaton’s formula; the formula is like this:
71
n L
Correct U
Correct FV
2 +
= Explanation:
FV : Facility value or item of difficulty that we are looking for
CU : Sum of the students from the upper group who answer correctly
CL : Sum of the students from the lower group who answer correctly
2n : Total number of the students from upper and lower group.
69
Suharsimi Arikunto, Dasar – dasar..., p. 207
70
M. Chobib Thoha, Teknik Evaluasi Pendidikan, Jakarta; PT. Raja Gafindo Persada, 2003, p. 145
71
J.B. Heaton, Writing English..., p. 178
25
After calculating the difficulty level of each item, the writer calculates the index of difficulty of all item by this formula;
P = ∑b
N
P : difficulty level of all items B : difficulty level of each items
∑ : Sigma total N : Total numbers of test items.
To know the criteria of the difficulty level of each item and all items, the writer uses the measurement level referred to Suharsimi Arikunto’s
book.
72
If the FV is:
Difficult : 0.00 – 0.30 Moderate
: 0.31 – 0.70 Easy
: 0.71 – 1.00
The level of facility value shows the easiness or difficultness of test items for that group. So, the level of facility value is influenced by the
students’ competence. The result will be different if the test is given to another group of learners or students.
E. The Importance of Item Analysis
An item analysis is very important for teachers in preparing better test items and help teachers in the teaching-learning process. “Item analysis is an
important and necessary step in the preparation of good multiple-choice tests”.
73
72
Suharsimi Arikunto, Dasar – dasar..., p. 210
73
John. W. Oller, Language Tests at School. A Pragmatic Approach, London: Longman, 1979, p. 245
26
‘For teacher made test, the following are among the important uses of item analysis: determining whether an item functions as teacher intends,
feedback to the teacher about pupil difficulties, are for curriculum improvement, revising the item and improving item writing skills”.
74
1. Determining whether an item functions as teacher intends.
The item will function properly if the test item tested is able to distinguish those who master the learning objectives from those who do
not. To differentiate between them, the test item should have certain level of difficulty, discriminating power and the effectiveness of distracters.
Therefore item analysis should be done. 2.
Feedback to students’ performance and as a basis for class discussion. After knowing the students’ responds to the item, the students’
performance can be known and the students’ error can be corrected and the test items that are felt difficult for most of them can be discussed in their
class. 3.
Feedback to the teacher about pupils’ difficulties The result of item analysis will be useful for teachers to know the
major types of pupils’ difficulties in learning. So they know the material needs to be review in next learning.
4. Area for curriculum improvement.
By item analysis, it can be known what kind of items which are felt difficult by students or certain errors occur often, may be the item is not
compatible to be taught in a school program. So curriculum may be needed to be revised.
74
Anthony J. Nitko,Educational Test..., p. 284
CHAPTER III RESEARCH METHODOLOGY
1. The Objective of The Research
The research is done to find out the difficulty level of the English summative test items in the second year of Mts. Darul Ma’arif Jakarta in the
second term 20082009 academic year by calculation which is referred to J.B Heaton’s book; “Writing English Language Test”.
2. The Method of Study
The method used in this study can be categorized into descriptive analysis. This descriptive analysis is concerned with a quantitative analysis.
Quantitative is used in analyzing data of scores to detect the test items whether it is good or not by using simple statistic tabulation.
3. The Time and Place
The research was held during teaching practice from March to June 2009 at MTs Darul Ma’arif which is located at Jl. Rs. Fatmawati No. 45
Cipete , South Jakarta .
4. The Respondents
The writer took the result of the English summative test of the second grade at MTs. Darul Ma’arif Cipete South Jakarta, which consist of 50 English
multiple choice items. The respondents of this research are the second year students of MTs. Darul Ma’arif Jakarta, which which consists of 36 students.
5. The Instrument of the Research
The research instrument is the English summative test paper for the second year students of MTs., Darul Ma’arif Jakarta.
27
CHAPTER IV RESEARCH FINDINGS