Language Testing Authenticity Theoretical Description

14

CHAPTER 2 REVIEW OF RELATED LITERATURE

This chapter presents review of theoretical writings and research related to the study matter. Furthermore, this chapter helps the researcher to answer two research questions. This part contains two major parts of the review of related literature namely theoretical description and theoretical framework.

A. Theoretical Description

1. Language Testing

Testing refers to the activity of testing individuals or things in order to reveal certain information. Besides revealing certain information, testing is conducted to measuring one’s capability, knowledge or performance in a certain domain. It has an essential role in social lives and its function is to measure ability in general or in specific objectives of a person McNamara, 2000. Furthermore, a test becomes an entrance at essential intercessor moment in education, for instance if students are going to continue their study into a university, they have to pass certain tests like an achievement test and a university selection test. Since it is a language test, the test is related to one’s skills in language use or to perform language. Brown 2004 explains that performing language skills is able to be administered in form of speaking, writing, reading, or listening p.3. In its development, the language test now simulates the real-life situations. 15

2. Language Test

Tests refer to an examination of one’s knowledge or ability and they commonly consist of questions to be answered or activities to be presented. In other words, Brown 2004 defines tests as instruments of measuring the ability and knowledge p.3. In practice, language tests are differenciated into the way they are composed or designed method and the purpose of designing the test itself. In term of method, language tests are differentiated into two types namely traditional paper-and-pencil language tests and performance tests McNamara, 2000. In order to make a language test become effective, the language test items should meet principles of language assessment. Two of the principles are content validity and authenticity. Brown 2004 explains there are five criteria for testing a test namely practicality, reliability, validity, authenticity, and washback p.19.

a. Paper-and-pencil language tests

Paper-and-pencil language tests are well known as written tests. According to McNamara 2000 paper-and-pencil language tests are employed to assess separate components of language acquaintance such as: grammar and vocabulary and the receptive skills listening and reading comprehension. The test items are commonly in a fixed response format since these tests are easy to administer and score. McNamara 2000 explains that students do not construct the answer in the fixed response format tests, for instance in multiple choice format. However, they choose one of the optional answers. These tests are considered not effective to assess the productive skills of speaking and writing. 16 The example of paper-and-pencil language tests is National Examination since the test items in multiple choice formats. Besides, the aim of administering National Examination is to assess the students’ receptive skills. Receptive skills refer to the ability to receive information and understand it by listening or reading.

b. Performance tests

Nowadays, performance tests are well known as oral tests. Performance tests are different from paper-and-pencil language tests since they assess the act of communication McNamara 2000. Therefore, it is specifically used to assess speaking and writing skills.

3. Test Purposes

In term of test purposes, language tests are differentiated into five types. They are namely language aptitude tests, proficiency tests, placement tests, diagnostic tests, and achievement tests. Brown 2004 explains that by defining the purpose of testing, the test-designers will more focus on the certain objectives of the tests p.42. Besides, it helps the test-designers to compose the items.

a. Language Aptitude Tests

According to Brown 2004, language aptitude test is “a test which is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking ” p.43. Language aptitude test is not commonly employed since this test is used to predict one’s achievement in learning a language. The achievement of the tests is assessed by likewise processes of mimicry, memorization, and puzzle-solving Brown, 2004. The 17 reason is doubtful since the factors like appropriate self-knowledge, active strategic involvement in learning, and strategies-based instruction influence somebody’s success. The examples of language aptitude tests are the Modern Language Aptitude Test MLAT and Pimsleur Language Aptitude Battery PLAB.

b. Proficiency Tests

Proficiency tests are not limited to one course, curriculum, or certain language skill. However, it tests all skills. Brown 2004 explains that proficiency tests consist of standardized multiple-choice items on grammar, vocabulary, reading comprehension, listening comprehension, writing skill, and sometimes oral production performance p.44. In addition, McNamara 2000 states that proficiency tests are not correlated to the process of teaching since tests more refer to the future ‘real-life’ language use as the criterion. Therefore, the score of this kind of tests has gate-keeping function especially in the educational field and working area. An example of standardized proficiency tests is Test of English as a Foreign Language or well-known as TOEFL.

c. Placement Tests

Brown states placement tests are tests which are employed to place a student into a certain level or course correctly. Since it is for placing a student into a certain level or course, the tests commonly comprise the sample of the material completed in various levels in a certain curriculum p.45. The tests are various; they can be in a form of written test or oral test, depending on the nature of the 18 program. An example of standardized placement tests is the English as a Second Language Placement Test ESLPT at San Fransisco State University.

d. Diagnostic Tests

Brown 2004 explains diagnostic tests are used to diagnose certain aspects of a language p.46. In the practice, the test administrators have a checklist of features in order to point toward difficulties. The diagnostic test results help the teachers to decide on what aspects they have to focus. Besides, it provides information to the students to be aware of errors.

e. Achievement Tests

According to Brown 2004, an achievement test is limited to certain materials related to a curriculum within a particular time frame. An achievement test is used to determine whether the objectives of the course have been met by the end of an instruction period p.47. Therefore, an achievement contributes in teaching learning process since it is related to classroom lessons, units, or curriculum. An example of achievement tests is National Examination.

4. Validity

Bachman 1990 explains that in order to make a test score becomes a meaningful indicators to assess the individual’s ability, the test should concern only to the ability which is expected to be tested p.238. Bachman 1990 advances the validity of a test shows the quality of the test itself. When a test meets validity, consequently the test score effectively reflect the true condition of students’ competence p.236. Moreover, it is able to be the meaningful 19 indicators. However, in order to meet the validity, the test should reflect the skills or behavior which would be assessed. There are five types of validity to determine whether or not a test is valid namely face validity, content-related evidence content validity, criterion-related evidence, consequential validity, and construct validity.

a. Face Validity

Gronlund 1998 states a test is considered having face validity if the students look the test as fair, pertinent, and utile for improving learning as cited in Brown, 2004, p.26. Face validity itself refers to how the test looks good and it obviously appears to measure the skills which are going to be measured. Furthermore, according to Brown 2004 criteria of a test which has face validity are that the test is well-constructed, the test has the time allotment, the items are obvious and simple, the directions are clear, the tasks meet content validity, and the difficulty level presents a reasonable challenge p.27.

b. Content Validity

According to APA 1954, content validity refers to the scale that the content of assessment items reflects the content domain of interest as cited in Miller, 2003, p.2. Shepard 1993 adds that content validity is an indicator to interference the result. It is “evidence-in waiting” as cited in Miller, 2003, p.5. It means that whenever a test meets validity in the content, the items of the test represent the skills or behavior to be measured in order to evaluate achievement tests. 20 Therefore, the scores of the test are effectively used as the meaningful indicators of students’ competence, for instance, a test for reading skills would be considered as a valid reading test if a test of reading measures reading skill and nothing else. The test is not a valid test for speaking or vocabulary because it does not test speaking or vocabulary. However, Seif 2004 claims it does not mean all educational objectives of a particular course are included in the test. Due to test practicality, the test designers should compose several questions which are able to be representatives of achieving the set educational goals. Seif 2004 claims content validity is one of essential parts to compose a test as cited in Jandhagi and Shateria, 2008, p.2. As a test does not meet validity in its content, there will be two possible outcomes. First, students are not able to perform the needed skills which are not included in the test. Second, there may be some inappropriate questions which students are not able to answer. Therefore, the test tasks should be appropriate to the test specifications on the blueprints. It is similar to what Seif 2004 says, evaluating content validity of a test can be carried out by matching the sample of the test questions to the test instructions as cited in Jandhagi and Shateria, 2008, p.2. Crocker and Algina 1986 advance that ‘matching method’ effectively ensure validity as cited in Miller, 2003, p.12. According to Bachman and Palmer 1996, blueprint is a completed plan providing the characteristics to develop the entire tests p.90. It contains task specifications for all type of tasks which are to be included in a particular test. The blueprints are evaluation tools to check whether or not the test items are appropriate to the test specifications stated in the blueprints. Brown 2004 states 21 that test specifications include the general outlines of the test and the test tasks p.50. The test specifications refer to a certain curriculum and it consists of only the general outlines of whole materials and skills to be tested since the test designers should consider test practicality.

c. Criterion-Related Evidence

Brown 2004 defines criterion-related evidence refers to the criterion of the test which is expected to be achieved. Criterion-related evidence validity is commonly categorized into two types namely concurrent validity and predictive validity p.24. Criterion-related evidence is categorized into two types namely concurrent and predictive validity.

d. Construct Validity

Brown 2004 states construct validity has a big role in a test design. Furthermore, it is a main concern in validating large-scale standardized test of aptitude p.25. It means that in making a test or testing a person, the test- designers or the examiners should adhere to practical procedures and principles. It is for example in determining the scoring criteria of a speaking test, the examiner should consider some factors such as pronunciation, accuracy, vocabulary use, and sociolinguistic appropriateness.

e. Consequential Validity

It refers to the consequences of a test. According to Brown 2004, a test raises various consequences, namely considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learners, and the intended or unintended social consequences of a test’s 22 interpretation and use p.26. Besides, the effects of test preparation courses and manuals are the effect of consequential validity.

5. Authenticity

There are many different views of authenticity since some experts may define authenticity variously. Scarcella and Oxford 1992 define that authenticity refers to unedited and unabridged text as cited in Day, 2003, p.4. While Widdowson 1976 emphasizes that authenticity is not only about the quality of a text at all but authenticity is reached when the readers understand the writer’s intention p.264. Williams 1984 explains that authentic texts are written to convey a message as cited in Day, 2003, p.4. It means that authentic texts’ purpose mainly is for communication. According to Richards 2001 authentic materials are important to be applied in language teaching and learning. There are several benefits: 1. Since authentic materials are used for communication and exist in the real world. Therefore, authentic materials are considered as more interesting and motivated than created materials. 2. Since they are authentic, the materials are considered as having lots of appropriate information about the target language. 3. Authentic materials are composed not to illustrate some grammatical rules or discourse types. It resembles true language pp. 252-253. However, Richards 2001 advances that there are some critics of the use of authentic materials as follows. 23 1. Authentic materials have difficult language and vocabulary in it which potentially distracts learners and teachers. 2. Using authentic materials burden teachers since the teachers should find the suitable ones for teaching. The teachers are not able to simplify the materials easily because it would be considered as the unauthentic ones p.253. In judging test items belong to the authentic ones, Bachman and Palmer 1996 claim that people cannot define the test items are authentic just by viewing it pp.28-29. There are two kinds of authenticity in the case of National Examination test items; they are task characteristics and text characteristics. Task characteristics represent the authenticity of test instructions; therefore, the focus is on the test tasks a set of test instructions and the provided options. In addition, text characteristics has essential role in order that test items meet authenticity. It represents the appropriateness of the passages used as the test materials. There are several indicators to measure test items’ authenticity since authenticity cannot be defined just by looking at it. According to Brown 2004, in order that a test meets criteria of authenticity, the test items should represent five ways such as the test language is as natural as possible, the items should be contextualized, the topics should be relevant to the learners, there are some thematic organizations to items, and the tasks represent the real-world tasks p.28. These ways are utilized to analyze the authenticity of National Examination test items in case of the test tasks. Natural language use refers to “the language of ordinary speaking and writing” Merriam-Webster Dictionaries, 2013 and it is not able to untangle from 24 linguistic facets such as typographical mistakes in reading materials, lexis, morphemes, word orders and grammar syntactic matters, diction, and meaning semantic matters. The test tasks and test texts should resemble how natural the language is used as in the reality Brown, 2004: 28. The second indicator is the contextualization of the test items which means the test items are orderly organized into the same topics, for example, in a story line. The third indicator is relevance of the test topics and the learners, which means that the materials should be appropriate to learners’ ability. In some cases, many of authentic passages have difficult level of language which may burden language learners who have lower level of language. The statement is emphasized by Brown 2004 who explains that one of authenticity criteria is that the topics used in the test should be relevant to the learners p.28 and added by Nutall 1996 who says the high-level-of-language texts are not suitable for improving or developing reading skills p.177. In order to make test items more authentic, the test designers should consider of the existence of some thematic organizations to items. This indicator has correlation to the second indicator since both indicators persuade the test designers to organize the test items orderly in some story lines. The last indicator is that the tasks should represent the real-world tasks. It means that authentic materials are selected from real-world sources as cited in Brown, 2004, p.28. In addition, the test tasks should be designed to obtaining certain information related to the text rather than asking for some grammatical forms or lexical items. Williams 1984 explains that authentic texts are written to convey a message as 25 cited in Day, 2003, p.4. The statement indicates that authentic texts’ purpose is mainly for communication. It aims not focusing on teaching grammatical forms. Brown 2004 adds in listening items authenticity points up dialogues or monologues spoken by native speakers which represent conversations happen in the real-life p.28. It indicates that to achieve authenticity, natural language use is important such as in listening test there should be hesitations, white noise, and interruptions.

B. Theoretical Framework

A language test is a systematic method to measure one’s capability, knowledge, or performance in a certain domain in its relation with the language use. In order to meet usefulness of a language test, the test should meet a good test’s criteria, for instance: reliability, validity, practicality, and authenticity Brown, 2004. Therefore, the language test should be high quality since it is a measurement o f students’ capability. One of the types of language tests is English test of National Examination. McNamara 2000 states that in terms of methods, National Examination is a kind of paper-and-pencil language tests written test. Paper-and-pencil language tests belong to receptive tests because they test somebody’s receptive skills such as listening and reading skills. In terms of test purposes, National Examination is categorized into achievement tests McNamara, 2000. As an achievement test, National Examination corresponds to the classroom lessons, units, or curriculum Brown, 2004. The bases of composing National Examination are the Competence Standard, Basic