INTRODUCTION THEORETICAL FRAMEWORK PROFILE OF SCHOOL RESEARCH FINDINGS CONCLUSION AND SUGGESTION RESEARCH METHODOLOGY

OUT LINE

CHAPTER I INTRODUCTION

A. Background of Study

B. Significance of the Study C. Limitation of Problem D. Formulation of Problem E. Research Methodology F. Organization of Writing

CHAPTER II THEORETICAL FRAMEWORK

A. Evaluation B. The definition of test C. Testing roole D. Types of test a. Function 1. The placement test 2. The diagnostic test 3. The achievement test 4. The proficiency test b. Way of scoring 1. Objective test 2. Subjective test E. The Characteristic of a Good Test 1. Validity

2. Reliability

3. Practically F. Item Analysis 1. Level of difficulty 2. Discriminating power 3. Distracter G. The importance of item analysis

CHAPTER III PROFILE OF SCHOOL

A. History of School B. Vision and Mision of School C. Facilities of School D. Organization Structure of School E. Teachers, Staffs, and Students

CHAPTER IV RESEARCH FINDINGS

A. Population and Sample B. Time of Research C. The Data Description D. The Data Analysis E. The Data Interpretation

CHAPTER V CONCLUSION AND SUGGESTION

A. Conclusion

B. Suggestion

CHAPTER I INTRODUCTION

A. Background of Study

Evaluation is an important part of every teaching and learning experiences. It gives big contribution for the teaching and it provides an information about the students’ progress which can be used by the teachers to manage the learning task and students. As stated by Pauline Rea- Dicksin and Kevin Germain; “Evaluation is important for the teacher because it provides a wealth of information to use for the future direction of classroom practice, for the planning of courses and for the management of learning tasks and students.” 1 Evaluation also can be said as the process to make desirable decision toward teaching and learning based on the information that has been collected, synthesized, and reflected on. Lyle F. Bachman states “Evaluation can be defined as the systematic gathering of information for the purpose of making decision”. 2 Depending upon the decision being made and the information a teacher needs in order to inform that decision, testing often contribute to the process as the implementation of evaluation. Indeed, a test is one kind of evaluation instrument to collect data. “A test is defined as a systematic procedure for observing and describing one or more characteristics of a person with the aid of either a numerical scale or category system”. 3 In other word, a test measures a person’s ability or knowledge with a number of tasks or questions. According to Henning “. . . tests in general is to pinpoint strengths and 1 Pauline Rea and Kevin Germain, Evaluation, New York: Oxford University Press, 1992, p. 3 2 Lyle F. Bachman, Fundamental Considerations in Language Testing, Oxford; oxford University Press, 1990, p. 22 3 Anthony J . Nitko, Educational Test and Measurement, An Introduction, New York: Harcourt Brace Javanovich, Inc, 1983, p.6 1 2 weakness in the learned abilities of students”. 4 Teachers need to do the test because through the test they are able to find out the students’ achievement in mastering the lessons that have been taught and to evaluate the effectiveness of the method used and the teaching material. Rebecca M. Valette states, “…through tests the teacher can evaluate the effectiveness of a new teaching method, of a different approach to a difficult pattern, or of a new materials”. 5 To measure the students’ learning progress at school, a teacher commonly administers two kinds of test; formative test and summative test. The former test is held earlier than latter test which is held at the end of semester. Through both tests, a teacher can measure the students’ achievement level and the degree of how far the instructional objectives of learning be accomplished by them. For this reason, Gronlund states that; “Formative test is used to monitor learning progress during instruction. Its purpose to provide continuous feedback to both pupil and teacher concerning learning successes and failures ………..And summative test typically comes at the end of a course of instruction. It is designed to determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or for certifying pupil mastery of the intended learning outcomes”. 6 For getting accurate measures a test must have a good quality, because a good test doesn’t only influence the students learning, but also influences the teachers to improve teaching and learning process. JB. Heaton supports that “Test may be constructed primary as device to reinforce learning and to motivate the students’ performance in language” 7 . In addition, Lyle F. Bachman states also that “Test are often used for pedagogical purposes, either 4 Grant Henning, A Guide to Language Testing, U.S.A: Newbury House Publishers, 1987, p. 1 5 Rebecca M. Valette, Modern Language Testing, U.S.A; Harcourt Brace Javanovich, 1977, p. 5 6 Norman E. Gronlund, Measurement and Evaluation in Teaching 4 th edition, Macmillan; Publishing Company, 1976, p. 18 7 JB. Heaton, Writing English Language Test, New Delhi; Tata Mc. Graw-Hill Publishing Company, 1998, p.13 3 as a means of motivating students to study or as means of reviewing material taught”. 8 As the accuracy of a test result influences the motivation of students learning, so the test administered must reflect a good test. A good test is a test which has the criteria of validity, reliability, and practically. Beside that, it must has discriminating power and difficulty level. 9 A test can be valid if the test can measure what is supposed to measure. It can be reliable if the result of the test is the same even though the test administered to the same level students in the next time. And it can be practical if it is easy to do and administer. The matter, which is often forgotten by the teacher is the follow up of the test implementation pertaining to the test item it self. In fact, they do not criticize whether or not all items have fulfilled the criteria above. Therefore, it really required an analysis of the test items, that is namely “item analysis”. Through analyzing test item teacher can identify good item and the poor item and to differentiate between student who have done well and poorly. According to J. Stanley Ahmann and Marvin D. Glock, the purpose of doing item analysis is: “Re-examining each test item to discover its strengths and flaws is known as item analysis. Item analysis usually concentrates on two vital features; level of difficulty and discriminating power. The former means the percentage of pupils who answer correctly each item; the latter the ability of the test item to differentiate between pupils who have done well and those who have done poorly”. 10 8 Lyle F. Bachman, Fundamental Consideration in Language Testing, Toronto; Oxford University Press, 1990, p. 22 9 JB. Heaton, Writing English Language Test, New Delhi; Tata Mc. Graw-Hill Publishing Company, 1998, p. 152-156 10 J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth, Principles of Tests and Measurements, Boston: Allyn and Bason, INC, 1967, p. 184 4 In addition Ngalim Purwanto states; “Tujuan Khusus dari item analisis adalah mencari soal tes mana yang baik dan mana yang tidak baik, dan mengapa item atau soal itu dikatakan baik dan tidak baik.” 11 The latest English summative test at MTs.Darul Ma’arif was held on June 19, 2009. According to pre-survey result during teaching practice at Mts. Darul Ma’arif , the writer was informed that in the occasion of second semester, the English teacher has never analyzed the test items, so that is difficult to say whether it is a good test or not. In addition, the test results show that the scores of the students’ are bad. Considering this fact, the writer is interested in making item analysis through the items of English summative test at MTs. Darul Ma’arif Jakarta, in the second term 20082009 academic year.

B. Limitation of the Problem

The writer limits the study of item analysis of the English summative test which is administered for the second year of MTs. Darul Maa’rif Jakarta 20082009 academic year on the aspect of difficulty level or facility value.

C. Formulation of Problem

Based on the background of study described, the writer would like to seek the answer the following problem; “Does the English summative test items for the second year students of MTs. Darul Ma’arif Jakarta have a good quality in terms of difficulty level?” 11 Drs. Ngalim Purwanto, Prinsip-prinsip dan Tehnik Evaluasi Pengajaran, Bandung; Remaja Rosda Karya, 1991, p. 118 5

D. Significance of the Study

Firstly, it provides with the feedback to the writer especially, and the English teacher of how to analyze the test items in terms of difficulty level. Secondly, it informs the English teacher about the quality of test items in terms of difficulty level. Through this research, the English teacher can know the good items for the future used and the students’ achievement in mastering the materials taught in order to evaluate the teacher’s competence in teaching.

E. Organization of Writing

In discussing the topic, the writer divides this study into five chapters, as follow Chapter one is introduction, involving background of study, significance of study, limitation of problem, formulation of problem, significance of study and organization of writing. Chapter two is theoretical framework which discusses about evaluation, the test and its types, the criteria of a good test and item analysis Chapter three discusses is research methodology which is include the objective of research, the method of study, the time and place, the population and sample, the instrument and the procedure of the research. Chapter four presents the research findings which consist of the data description and the data analysis. Chapter five is devoted to the conclusion of what has been discussed and analyzed in the chapters before, and also the writer’s suggestion through the research.

CHAPTER II THEORETICAL FRAMEWORK

A. The Definition of Evaluation

Evaluation is important for every process of anything that has done, because through evaluation we can find out the weakness which should be revised and the strengths which should be improved, so does in the teaching learning process evaluation plays important role to contribute and provide some information for making judgments about what is good or desirable as in order to improve the students’ knowledge in learning and the teacher’s competence in teaching,. It is likely what Peter W. Airasian defines: “Evaluation is the process of judging the quality or value of a performance or a course of action”. 1 Still in the same sense Lyle F. Bachman states “Evaluation can be defined as the systematic gathering of information for the purpose of making decision”. 2 And evaluation includes, “the making judgments about the value, for some purpose, of ideas works, solutions, methods, materials, etc”. 3 Hence, Benjamin S. Bloom,et.al states that “Evaluation is a system of quality control in which It may be determined at each step in the teaching-learning process whether the process is effective or not, and if not what changes must be made to its effectiveness before it is too late”. 4 Basically, the purpose of evaluation is to judge the worth of program or procedure, usually in terms of how well it has achieved its objectives and 1 Peter W. Airasian, Classroom Assesment; Concepts and Applications, 1221 Avenue of the Americas, New York, NY 10020; McGraw-Hill, 2005}, 5 th edition, p. 9 2 Lyle F. Bachman, Fundamental Confiderations..., p. 22 3 Julian C. Stanley, Measurement In Todays’ School, Englewood Cliffs; Prentice-Hill, Inc, 1964, p. 16 4 Benjamin S.Bloom, Handbook on Formative and Summative of Students Learning, London; Longman, 1971, p. 8 6 7 for this purpose all appropriate techniques of gathering evidence may be used. 5 “Evaluation goes beyond the statement of how much to concern it self with the question what value. It seeks to answer the pupil’s and teacher question of what progress am I making???. 6 Richard I. Arends states that “ An important purpose of testing and evaluation is to provide students with feedback on how they are doing”. 7 Finally, considering all those opinions above about evaluation, the writer can summarize that evaluation is a systematic process to provide available information in order to make judgment and desirable decision of how to measure whether the objective is suitable or in line of the curriculum used, and to find out the students’ improvement in teaching learning process and the teacher competences in teaching, and also the classroom climate.

B. The Definition of Test

When people hear the word assessment and evaluation, they often think right a way of tests because a test is one of the instruments of evaluation for collecting the data. A test is a formal, systematic, usually paper-and-pencil procedure for gathering information about pupil’s performance. 8 While paper- and-pencil tests are one important tool for gathering assessment information. A test is composed of a number of tasks or questions for students to respond. By analyzing the responses, the teacher can measure the student’s achievement in the teaching learning process. While Lyle F. Bachman states that; “A test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual”. 9 While 5 Victor H. Noll, Introduction to Educational Measurement, Boston; Houghton Mifflin Company, 1965, 2 nd edition, p.14 6 H.H.Remmers, N.L.Gage, J.Francis Rummel, A Practical Introduction to Measurement and Evaluation , USA; Harper and Brother Publishers, 1960, p. 7 7 Richard I. Arends, Learning To Teach, New York, Mc.GrawHill International Edtion, 1989, p. 312 8 Peter W. Airasian, Classroom Assesment..., p. 9 9 Lyle F. Bachman, Fundamental Consideratin..., p. 20 8 Wilmar states that; “A test is a set of questions, each of which has a correct answer, that examinees usually answer orally or in writing”. 10 From those views of test, it can be concluded that a test can be instrument, techniques, or procedures to have the students’ respond through tasks or performance in the form of set of questions must be answered in order to achieve the teaching-learning objectives. In short, a test is a measurement instrument designed to assess a specific sample of individuals’ behavior. Test is also a way to deliver information, which is very useful for many practitioners of education. “A test is a formal systematic procedure for gathering information”. 11 Therefore, test a device of educational is necessary in a teaching process, since testing and teaching can not be separated. Heaton states that ”both testing and teaching are so closely interrelated that is virtually impossible to work in either field without being constantly concerned with the other”. 12 The reason of that interrelation and connection between testing and teaching is the material tested, must be based on the material taught in order to find out how far the students comprehension.

C. Type of Tests

There are many kinds of tests used to measure students’ achievement that can be used in an evaluation process. The type of test can be classified into two types, namely; function and way of scoring. 1. Function According to Andrew Harrison, the types of functional test can be categorized into four types: placement test, diagnostic test, achievement test, and proficiency test. 10 Wilmar Tinambuan, Evaluation of Students Achievement, Jakarta; Depdikbud, 1988 p. 310 11 Julian C. Stanley, Measurement in Today’s..., p.3 12 J.B. Heaton, Writing English..., p.1 9 a. The Placement test Placement test is used to place a student to appropriate level or section of a language curriculum or school. It usually happens in the beginning of course. According to Wilmar Tinambuan; A placement test is designed to determine pupil performance at the beginning of instruction. Thus, it is designed to sort new students into teaching groups, so that they can start a course at approximately the same level as the other students in the class. It is concerned with the student’s present standing, and so relates to general ability rather than specific points of learning. As a rule the result are needed quickly so that the teaching may begin. 13 b. The Diagnostic Test Diagnostic test is designed to diagnose a particular aspect of a language. “Diagnostic tests are also achievement test, but they are characterized by one distinctive feature, namely that they are designed to show specific weakness and strengths within the skills or elements measured”. 14 It can also be used to check the students’ progress in learning particular elements of the course. It is used for example at the end of a unit in the course book or after lesson designed to teach one particular point. 15 “A diagnostic test is designed to determine the degree to which the specific instructional objectives of the course have been accomplished”. 16 And J.B Heaton states that; “Diagnostic test is widely used, few tests are constructed solely as diagnostic tests. Note that diagnostic testing is frequently carried out of groups of students rather for individuals”. 17 13 Wilmar Tinambuan, Evaluation of Students..., p. 7 14 Robert Lado, Language Testing, Hongkong; Wing Tai Cheung Printing Co Ltd, 1961, p. 369 15 Andrew Harrison, A Language Testing Handbook, London; Macmillan Press, 1983, p.6 16 James Dean Brown, Testing in Language Program, New Jersey; Prentice Hall Regents, 1996, p. 15 17 J.B. Heaton, Writing English... , p.173 10 Thus, diagnostic test is much comprehensive and detailed because it searches for the underlying causes of learning difficulties and then formulates a plan for remedial action. c. Achievement Test These tests are used to know what students have actually learnt or on what have actually been taught. “Achievement tests are designed to measure relative accomplishment in specified areas of work”. 18 The purpose of achievement test as its name reflect is to establish how successful individual students, groups of students, or the courses themselves have been in achieving objectives. 19 In another point of view Wilmar says that “the degree purpose of achievement test is designed to indicate degree of students’ success in some past learning activities”. 20 And also “Achievement tests relate to the past in that they measure, what language the students have learned as a result of teaching”. 21 Based on the argumentation above about achievement test, the writer can conclude that the achievement test are intended to measure how effectively students have mastered the lesson and how far they have reached the instructional objectives. Thus, an achievement test must be designed with very specific reference to a particular course. This link with a specific program usually means that the achievement tests will be directly based on the course objectives and will therefore be criterion referenced. Such tests will typically be administered at the end of a course to determine how effectively students have mastered the instructional objectives. At the implementation level, the achievement test appears in two purposeful tests, they are formative test and summative test. 18 H.H. Remers, NL. Gage, J. Fraancis Rummel, A Practical Introduction..., p. 19 19 Arthur Hughes, Testing for Language Teachers, Cambridge; Cambride University Press, 2003, p. 13 20 Wilmar Tinambunan, Evaluation of Students..., p. 19 21 Tim Mc Namara, Language Testing, Hong Kong: Oxford university Press, 2000, p. 7 11 1 Formative test Formative test is administered by the teacher during the learning progress with the aim of using the result to improve instruction and to provide continuous feedback to both students and teacher. Rebecca M. Valette states “The formative test is given during the course of instruction; its purpose is to show which aspects of the chapter the student has mastered and where remedial work is necessary”. 22 Hence, formative test is part of the instructional process. When incorporated into classroom practice, it provides the information needed to adjust teaching and learning while they are happening. In this sense, formative test informs both teachers and students about student understanding at a point when timely adjustments can be made. These adjustments help to ensure students achieve, targeted standards-based learning goals within a set time frame. 23 2 Summative test Summative test is a test that usually administered at the end of the course. Rebecca M. Valette states ”the summative test, on the other hand, is usually gives at the end of a marking period and measures the “sum” total of the material covered. On this type of a test, students are usually ranked and graded”. Moreover, summative test is given periodically to determine at a particular point in time what students know and do not know. Summative test at the districtclassroom level is an accountability measure that is generally used as part of the grading process. Arthur Hughes states that”the content of summative test should be based directly on a detailed course syllabus or on the books and other material used”. 24 22 Rebecca M. Valette, Modern Language..., p.6 23 http:www.nmsa.orgPublicationsWebExclusiveAssessmenttabid1120Default.aspx 24 Arthur Hughes, Testing for Language…, p. 11 12 Finally, the writer can conclude that summative test is a test that usually administered at the end of a course of study. d. Proficiency Test The proficiency test is also used to measure what students have learned, but the aim of the proficiency test is to determine whether this language ability corresponds to specific language requirements”. 25 According to J.B. Heaton that “the proficiency test is concerned simply with measuring a student’s control of the language in the light of what he or she will be expected to do with it in the future performance of a particular task “. 26 And also James Dean Brown states that: “A proficiency-test assess the general knowledge or skill commonly required or prerequisite to entry into or exemption from a group of similar institution”. 27 Then, it should never be undertaken lightly. Instead, these decisions must be based on the best obtainable proficient test scores as well as other information about the student. The content of proficiency test therefore, is not based on the content of objective of language courses that people taking the test may have followed. Rather, it based on a specification of what candidates may have to be able to do in language, in order to be considered proficient”. 28 25 Rebecca M. Valette, Modern Language..., p.6 26 J.B. Heataon, Writing English... , p.172 27 James Dean Brown, Testing In Language..., p.10 28 Arthur Hughes, Testing For Language..., p. 9 13 2. Way of Scoring. Based on the manner of scoring, the type of test item is divided into two general types: objective and subjective test. J.B. Heaton states that “Subjective and objective test are terms used to refer to the scoring of tests”. 29 a. Objective test An objective test item is any test item that there is only a single correct answer. In this test, the students must select one option from some alternatives. According to Valette; “An objective test item is any item for which there is a single predictable correct answer”. 30 Hence, this item type referred as objective test item, because they can be scored objectively. That is, equally competent scorers can score them independently and obtain the same result. Therefore, whether the item is scored by one teacher or another, today or last week, it will yield the same score. That is, the advantages of the objective test items are objective scoring, that is quick, easy and consistent. The objective test item commonly used in classroom testing are true-false, multiple-choice, matching, and short answers. “These test item include all of the selection-type items-multiple choice, true false, and matching.” 31 1 True-False True-false is simply a declarative statement which the students must judge as true or false. As what J. Stanley explained that “true-false item is referred to alternative response item; the 29 J.B. Heaton, Writing English..., p. 25 30 Rebecca M. Valette, Modern Language..., p.6 31 Norman E. Gronlund, Constructing Achievement Tests, New Jersey: Prentice-Hall., Inc., 1968, p. 25 14 item asks the students to answer with the “true” if it conforms to the truth or “false” if it essentially incorrect. 32 Thus, the item provides the students with a choice of two alternatives, so the students have possibility to guess the answer and sometimes it will be the right answer. In other word, students indicate whether a statement is true or false. Example: T F True-False items classified as supply-type item 2 Multiple-choice item The multiple-choice item consists of a stem, which presents a problem situation, and several alternatives, which provide possible solutions to the problem. The stem may be a question or an incomplete statement. The alternatives include the correct answer and several plausible wrong answers, called distracters. Their function is to distract those students who are uncertain of the answer. “A multiple-choice item consists of one or more introductory sentences followed by a list of two or more suggested responses from which the examinee chooses one as the correct answer”. 33 Example: In objective testing, the term objective refers to the method of … a. identifying the learning outcomes b. selecting the test content c. presenting the problem d. scoring the answers 3 Matching The matching test item consists of two parallel columns with each word. Number of symbol in one column is being matched to a word, sentence or phrase in other column. This type 32 J. Stanley Ahman and Marvin D. Glock, Evaluating Pupil Growth..., p. 17 33 Anthony J. Nitko. Educational Test..., p. 190 15 of item is employed widely in situation where relationship of more or less similar ideas, facts and principles are to be examined or judged. In this type, students indicate relationship between a set of premises and a set of responses. Example: 1. The …. drives a car a. doctor 2. The …. checks the patience b. driver This kind of test is an effective way to student’s recognition of the relationships between words, definitions, events, dates, categories, examples, and so on. b. Subjective Test item Subjective test is a test where in its scoring requires judgment and evaluation of scores. While Vallette states that “Subjective item is one that does not have a single right answer”. 34 It means that the scoring is inconsistent and the answer of the question is in form of composition where the students are given a chance to relate their idea or argument in their own words. In other word, the answer is commonly in a form of composition or statement. “Subjective tests, like translation and essay, have the advantage of measuring language skill naturally, almost the way English used in a real life”. 35 The subjective tests that are commonly used in classroom are completion, short-answer, and essay item. 1 Completion The completion item is a written statement that requires the examinee to supply the correct word or short phrase in response to an incomplete sentence, a question or a word association. 34 Rebecca M. Valette. Modern Language..., .p. 10 35 Harold S. Madsen, Technique In Testing, Oxford; Oxford University Press, 1983 p.8 16 Completion test can be used effectively to measure the recall of terms, dates, and names. 36 The completion item and short answer item are both supply type test items, but in the short answer type, the blank is nearly always at the end, whereas in the completion, type of the blank may occur everywhere in the statement. 37 2 Short- answer Item The short answer item consists of a question, which can be answered with a word or short phrase. 38 A student provides a short response to a direct question or direction. Generally, teachers prefer to use the short answer type question, probably because they think it has some advantages. It is relatively easy to construct, it also gives the teacher some opportunity to see how well students can express their thought and it is also not difficult to score or mark than the essay question. 39 However, it is difficult to phrase the short answer question, so that only one answer is correct. And this type of question will be more useful only in testing knowledge of facts and quite specific information. 3 Essay test. The most notable characteristic of the essay test is freedom of response it provides. The student is asked a question which requires him to produce his own answer. He is relatively free to decide how to approach the problem, what factual information to use, how to organize his reply, and what degree of emphasis to give each aspect of the answer. Thus, the essay question places a 36 Wilmar Tinambuan, Evaluation of Students..., p. 61 37 Victor H. Noll, Introduction to Educational..., p. 140 38 Victor H. Nol, Introduction to Educational..., p. 138 39 Victor H. Nol, Introduction to Educational..., p. 138 17 premium on the ability to produce, integrate, and express the ideas. As what Norman E Gronlund states that; “Essay tests are inefficient for measuring knowledge outcomes . . . but they provide a freedom of response which is needed for measuring certain complex outcomes . . . . These include the ability to create . . . . to organize . . . . to integrate . . . . to express . . . and similar behaviors that call for the production and synthesis of ideas”. 40 Finally, from the explanation above about both objective test and subjective test concerned on the essay test, the writer conclude that for the measurement of most knowledge outcomes we would use objective test items to take advantage of their more extensive sampling and greater reliability. For the measurement of such complex learning outcomes as the ability to create, organize, and evaluate ideas, however, the teacher would use essay questions despite their limitation. Of the types of test item above, the writer will concern only with the multiple choice test item in English summative test for the second year students of Mts. Darul Ma’arif, administered at the end of the second semester 2008209 academic year.

D. Criteria Of A Good Test

There are many considerations entering into the evaluation of a test, which referred as a good test because a good test can provide available information for a good evaluation in order to measure student’s comprehension of the instructional objectives, but the writer consider them under three main headings;. These are respectively validity, reliability, and practically. Validity refers to the extent to which a test measures what we actually wish to measure. According to Brown “Validity is the degree to which the test actually measures what is intended to measure…..Reliability is 40 Norman E. Gronlund, op Constructing Achievement..., p. 65 18 consistent and dependable…….And practically is means of financial limitations, time constraints, ease of administration, and scoring and interpretation”. 41 1. Validity The single most important characteristic of a good test is its ability to help the teacher make a correct decision of what is intended to measure. This characteristic is called validity. “Validity is concerned with whether the information being gathered is relevant to the decision that needs to be made”. 42 A test has validity if it measures appropriately, what it is supposed to measure. According to Heaton: “The validity of a test is the extent to which it measures what is to measure and nothing else”. 43 Finnochiaro and Sako also state : “A test is valid when it measures effectively what it is intended to measure”. 44 Still in the same sense, Wilmar states that “The validity of a test is the extent to which the test measures what is intended to measure”. 45 Also, Norman E. Gronlund states that “test scores are valid to the extent to which they serve the use for which they are intended”. 46 While J. Staley Ahmann and Marvin D. Glock point out “In educational measurement, validity is often defined as the degree to which a measuring actually serves the purposes for which it is intended”. 47 Based on the definition, the writer can conclude that validity of test is important to know whether a test has a good quality in testing someone’s capacity. 41 H. Douglas Brown, Teaching by Principles An Interactive Approach to Language Pedagogy, San Fransisco: Longman, 2 nd edition, p. 386-387 42 Peter W. Airasian, Classroom Assesment..., p. 16 43 J.B Heaton. Writing English... , p. 159 44 Mary Finocchiaro and Sydney Sako, Foreign Language Testing a Practical Approach, New York: Regent publishing company, 1983, p. 24 45 Wilmar Tinambunan, Evaluation of student..., p. 11 46 Norman E. Gronlund, Constructing Achievement..., p. 105 47 J. Stanley Ahamnn and Marvin D Glock, , Evaluating Pupil Growth..., p. 285 19 As the validity is one of the most important characteristic of test scores, the constructor of the test should know the various aspects from the validity itself and various procedures by which they are determined. “The two most important characteristics of test scores are validity and reliability…Anyone working with tests-whether constructing them or using published tests-should understand the meaning of these concepts…and should know the various procedures by which they are determined”. 48 According to Heaton, a validity of a test can be seen from some aspects mentioned below. a. Face validity A test has face validity if the test has a good “face” or the way the test looks. According to Heaton: “if a test items looks right to other testers, teachers, moderators, and testers, it can be described as having at least face validity”. 49 While Marry Finocchiario and Sydney Sako define it is “A judgment about a test based on the way the test looks to educators, students, and the general public. The test should not only ‘be right’ it also ‘look right”. 50 b. Content Validity A test has content validity if the test contains materials that the student has been taught. To fulfill this, the teacher also should refer to the instructional objectives of the teaching learning process. Finocchiario and Sako state; “Content validity is assured by checking all items in the test to make certain that they correspond to the instructional objective of the course“. 51 Still in the same sense, Victor H. Noll explaines “when a teacher gives a test which deals with the 48 Norman E. Gronlund, Constructing Achievement..., p. 105 49 J.B Heaton, Writing English..., p. 159 50 Marry Finochiario and Sydney Sako, Foreign Language..., p. 28 51 Marry Finnochiaro and Sydney Sako, Foreign Language..., p. 25 20 material and with the objectives of instruction in particular class, his test is said to have curricular content validity”. 52 c. Construct Validity A test is said to have a construct validity if it can demonstrates that it measures just the ability, which it is supposed to measure .according to Heaton; “if a test has construct validity, it is capable of measuring certain specific characteristics in accordance with a theory of language behavior and learning”. 53 d. Empirical Validity A fourth type of validity is usually referred to as statistical or empirical validity. This validity is obtained as a result of comparing the result of the test with the result of some criterion measure. 54

2. Reliability

The second criterion of a good test is reliability. Reliability has to do with the accuracy and precision of a measurement procedure. Indices of reliability give an indication of the extent to which a particular measurement is consistent and reproducible. 55 A test should be reliable as a measuring instrument. According to Finocchiario and Sako; the reliability or stability of a language test is concerned with the degree to which it can be trusted to produce the same result upon repeated administration to the same individual, or to give consistent information about the value of a learning variable being measured”. 56 While J. Stanley Ahmann and Marvin D. Glock state that “Reliability means consistency of results. This is equivalent to saying that a highly reliable instrument can be used 52 Victor H. Noll, Introduction to Educational..., p. 79 53 J.B. Heaton. Writing English..., p. 161 54 J.B. Heaton, Writing English..., p. 161 55 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology and Education , London; John Willey and Sons, Inc., 1961, p. 127 56 Marry Finnochiario and Sydney Sako, Foreign Language..., p. 28 21 repeatedly in an unchanging situation and produce constant or near constant results.” 57 Based on above statements a test is reliable if it consistently yields the same or nearly the same ranks over repeated administrations.

3. Practicality

Practicality is concerned with a wide range of factors economy, convenience and interpretability that determine whether a test is practical for widespread use. “Practically is concerned with a wide range of factors economy, convenience, and interpretability that determine whether a test is practical for widespread use”. 58 A test maybe a highly reliable and valid instrument but still is beyond our means facilities. The teacher or someone who makes the test should keep in mind a number of very practical considerations. There are many factors of practicality; economy, scorability, and administrability. According to Finnochiario and Sako state that “the criteria for practicality normally will be based upon such factors as economy, scorability, and administrability”. 59 While, Harrison states that “tests should be as economical as possible in time preparation, sitting, and marking and in cost material and hidden costs of time spent”. 60 In short, the criteria of a good test are validity, reliability and practicality. However, besides those three criteria, a good test as whole is also determined by the quality of each item that construct the set test. If the quality of each item is good, it can give the strength and accuracy of the scores get from the test. Then, the quality of each item individually can be analyzed by doing item analysis. According to Robert Lado; “item analysis is the study of validity, reliability, and difficulty of test item taken 57 J. Stanley Ahmann and Marvin D. Glock, , Evaluating Pupil..., p. 311 58 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation..., p. 127 59 Marry Finnochiario, Foreign Language Testing..., p. 30 60 Andrew Harrison, A Language Testing..., p. 13 22 individually as if they were separate tests”. 61 through this analysis, the evaluator can get information about which item is good for the future used.

D. Item Analysis

After a test has been administered and scored it is usually desirable to evaluate the effectiveness of the items. This is done by studying the students’ responses to each item. When formalized, the procedure is called item analysis. Anthony J. Nitko states, “item analysis refers to the process of collecting, summarizing, and using information about pupils’ responses to items”. 62 Meanwhile Harold S. Madsen explained that: “The selection of appropriate language items is not enough by it self to ensure a good test. Each questions needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather some simple statistical ways of checking individual item. This procedure is called ‘item analysis’.” 63 An item analysis also is a systematic procedure which provides some information about the quality of the test item, concerning each of the following points: 1. The difficulty of the item 2. The discriminating power of the item 3. The effectiveness of each alternatives or distracters. Thus, item analysis information can tell the evaluator or constructor if an item was too easy or too hard, how well it discriminated between high and low scorers on the test, and whether all of the alternatives functioned as intended. According to Suharsimi Arikunto, “Analisis soal antara lain bertujuan untuk membantu kita dalam mengidentifikasi butir-butir soal yang jelek, memperoleh informasi yang akan digunakan untuk menyempurnakan 61 Robert Lado, Ph. D, Language..., p. 342 62 Anthony J. Nitko, Educational Test..., p. 342. 63 Harold S. Madsen, Technique In... , p. 180 23 soal-soal untuk kepentingan lebih lanjut, dan untuk memperoleh gambaran secara selintas tentang keadaan yang kita susun”. 64 Item analysis data also aids in detecting specific technical flaws and thus further provides information for improving test items, as what J. Stanley Ahmann and Marvin D. Glock state “item analysis is re-examining each test to discover its strength and flaws”. 65 Item analysis has several benefits. First, it provides useful information for class discussion of test. Second, it provides data for helping the students improve their learning. Third, it provides insights and skills which lead to the preparation of better tests on future occasions. 66 Finally, the writer concludes that item analysis is very important to do in order to get information of the quality of the test item, whether it is good item or poor item.

1. Difficulty Level of The Item

The difficulty level of item means the percentage of pupils who answer correctly each test item. “The item difficulty is fraction of the persons taking an item who answer it correctly”. 67 Heaton states that “The index of difficulty “of facility value of an item simply shows how easy or difficult the particular item provide in the test. The index of difficulty facility value is generally expressed as the fraction percentage of the students who answered the item correctly”. 68 A good test item should have a certain degree of difficulty. It may not be too easy or too difficult because the test that is too easy or too difficult will yield same score distribution that make it hard to identify 64 Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, Jakarta; Bina Aksara, 1987, p. 205 65 J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth..., p. 184 66 Norman E. Gronlund, Constructing Achievement..., p. 85-86. 67 Anthony J. Nitko, Educational Test..., p. 288 68 J.B Heaton, Writing English..., p. 178 24 reliable differences in achievement between the pupils who have done well and these who have done poorly. Suharsimi Arikunto says; ”Soal yang baik adalah soal yang tidak terlalu mudah atau tidak terlalu sukar. Soal yang terllau mudah tidak merangsang siswa untuk mempertinggi usaha siswaq untuk memecahkannya. Soal yang terlalu sukar akan menyebabkan siswa menjadi putus asa dan tidak mempunyai semangat untuk mencoba lagi karena diluar jangkauannya”. 69 By analyzing the students’ response to the items, the level of difficulty of each item can be known and the information will be helpful for teacher in identifying concepts to re-teach the study material. In addition, by analyzing the facility value, the teacher will know if the item is easy, moderate, or difficult, M. Chobib Thoha states; “item yang baik adalah item yang tingkat kesukarannya dapat diketahui tidak terlalu sukar dan tidak terlalu mudah. Sebab tingkat kesukaran itu memiliki korelasi dengan daya pembeda. Bilamana item memiliki tingkat kesukaran maksimal, maka daya pembedanya akan rendah, demikia pula bila item itu terlalu mudah juga tidak akan memiliki daya pembeda”. 70 To measure the difficulty level of each item, the writer uses the Heaton’s formula; the formula is like this: 71 n L Correct U Correct FV 2 + = Explanation: FV : Facility value or item of difficulty that we are looking for CU : Sum of the students from the upper group who answer correctly CL : Sum of the students from the lower group who answer correctly 2n : Total number of the students from upper and lower group. 69 Suharsimi Arikunto, Dasar – dasar..., p. 207 70 M. Chobib Thoha, Teknik Evaluasi Pendidikan, Jakarta; PT. Raja Gafindo Persada, 2003, p. 145 71 J.B. Heaton, Writing English..., p. 178 25 After calculating the difficulty level of each item, the writer calculates the index of difficulty of all item by this formula; P = ∑b N P : difficulty level of all items B : difficulty level of each items ∑ : Sigma total N : Total numbers of test items. To know the criteria of the difficulty level of each item and all items, the writer uses the measurement level referred to Suharsimi Arikunto’s book. 72 If the FV is: Difficult : 0.00 – 0.30 Moderate : 0.31 – 0.70 Easy : 0.71 – 1.00 The level of facility value shows the easiness or difficultness of test items for that group. So, the level of facility value is influenced by the students’ competence. The result will be different if the test is given to another group of learners or students.

E. The Importance of Item Analysis

An item analysis is very important for teachers in preparing better test items and help teachers in the teaching-learning process. “Item analysis is an important and necessary step in the preparation of good multiple-choice tests”. 73 72 Suharsimi Arikunto, Dasar – dasar..., p. 210 73 John. W. Oller, Language Tests at School. A Pragmatic Approach, London: Longman, 1979, p. 245 26 ‘For teacher made test, the following are among the important uses of item analysis: determining whether an item functions as teacher intends, feedback to the teacher about pupil difficulties, are for curriculum improvement, revising the item and improving item writing skills”. 74 1. Determining whether an item functions as teacher intends. The item will function properly if the test item tested is able to distinguish those who master the learning objectives from those who do not. To differentiate between them, the test item should have certain level of difficulty, discriminating power and the effectiveness of distracters. Therefore item analysis should be done. 2. Feedback to students’ performance and as a basis for class discussion. After knowing the students’ responds to the item, the students’ performance can be known and the students’ error can be corrected and the test items that are felt difficult for most of them can be discussed in their class. 3. Feedback to the teacher about pupils’ difficulties The result of item analysis will be useful for teachers to know the major types of pupils’ difficulties in learning. So they know the material needs to be review in next learning. 4. Area for curriculum improvement. By item analysis, it can be known what kind of items which are felt difficult by students or certain errors occur often, may be the item is not compatible to be taught in a school program. So curriculum may be needed to be revised. 74 Anthony J. Nitko,Educational Test..., p. 284

CHAPTER III RESEARCH METHODOLOGY

1. The Objective of The Research The research is done to find out the difficulty level of the English summative test items in the second year of Mts. Darul Ma’arif Jakarta in the second term 20082009 academic year by calculation which is referred to J.B Heaton’s book; “Writing English Language Test”. 2. The Method of Study The method used in this study can be categorized into descriptive analysis. This descriptive analysis is concerned with a quantitative analysis. Quantitative is used in analyzing data of scores to detect the test items whether it is good or not by using simple statistic tabulation. 3. The Time and Place The research was held during teaching practice from March to June 2009 at MTs Darul Ma’arif which is located at Jl. Rs. Fatmawati No. 45 Cipete , South Jakarta . 4. The Respondents The writer took the result of the English summative test of the second grade at MTs. Darul Ma’arif Cipete South Jakarta, which consist of 50 English multiple choice items. The respondents of this research are the second year students of MTs. Darul Ma’arif Jakarta, which which consists of 36 students. 5. The Instrument of the Research The research instrument is the English summative test paper for the second year students of MTs., Darul Ma’arif Jakarta. 27

CHAPTER IV RESEARCH FINDINGS

Dokumen yang terkait

An Analysis of The Second Year Students at SMP Swasta Muhamamdiyah 5 Medan In Using Passive Voice

1 57 11

Translation Techniques Between The Translation Of English Novel Coco Simon’s Cupcake Diaries 2: Mia In The Mix Into Bahasa Indonesia And The Translation Of Indonesian Novel Andrea Hirata’s Laskar Pelangi Into English

1 77 112

A Technique Practiced By The Students Of English Department To Study English As A Foreign Language

0 36 43

The Analysis of Compound Sentence Found in the Jakarta post Newspaper

6 49 45

An Analysis On The Role Of Audiovisual In Improving Students’ Listening Comprehension: A Case Study Of The Eleventh Year Students In Yayasan Perguruan Indonesia Membangun (YAPIM) Mabar

1 55 51

An Attitude Analysis Of English Language Learning: A Case Study Of Second-Grade Students Of Natural Science Program At Sma 4 Binjai

0 32 86

an analysis on the content validity of the english summative test; a case study at the second year students of SMP PGRI 2 Ciputat

2 5 98

An analysis of students’ error in learning noun clause: a case study in the second grade students of SMA Darul Ma’arif

0 8 64

(ABSTRAK) AN ANALYSIS OF THE TEST ITEMS OF ENGLISH FINAL EXAMINATION FOR THE SIXTH YEAR STUDENTS OF ELEMENTARY SCHOOL IN SOUTH SEMARANG REGENCY IN THE ACADEMIC YEAR OF 2007/2008.

0 0 2

ITEMS ANALYSIS ON THE SCORE OF THE ENGLISH SUMMATIVE TEST (A Descriptive Study of the Tenth Grade Students of SMK N 3 Salatiga in the Academic Year of 2013/2014) ITEMS ANALYSIS ON THE SCORE OF THE ENGLISH SUMMATIVE TEST (A Descriptive Study of the Tenth G

0 0 129