International English language testing: a critical response

  point and counterpoint International English language testing: a critical response Graham Hall

  Uysal’s article provides a research agenda for

  IE LT S and lists numerous issues

  concerning the test’s reliability and validity. She asks useful questions, but her analysis ignores the uncertainties inherent in all language test development and the wider social and political context of international high-stakes language testing. In this response, I suggest there is ample evidence that, in the normal course of its test development and review processes,

  IE LT S is aware of and addressing

  problematic issues in its testing as they arise. However, I also argue that to address some of the issues arising from Uysal’s discussion, we need to take a broader perspective and examine the social, economic, and political dimensions of international high-stakes English language testing.

Introduction Language testing is an uncertain and approximate business at the best of

  times, even if, to the outsider, this may be camouflaged by its impressive, even daunting, technical (and technological) trappings, not to mention the authority of the institutions whose goals tests serve. Every test is vulnerable to good questions. (McNamara 2000: 85–6) Tests are inevitably political since what they do . . . is to sort and select to meet society’s needs. Testers cannot expect that their work will not have a political dimension. The proper reaction to such concern is surely to act with professional skill and rectitude within the contexts in which they work. (Davies 2003: 361)

  Assessing L2 writing presents test writers and testing organizations such as Cambridge E S O L , who co-manage the

  I E LT S test, with challenges to which

  there are no easy answers (Hamp-Lyons 1990). And yet, given the popularity and importance of

  I ELT S in people’s lives, ‘it is becoming

  increasingly important for Cambridge E S O L to be able to provide evidence of quality control in the form of assessment reliability and validity to the outside world’ (Shaw 2007: 14).

  I E LT S

  Hacer Hande Uysal’s review of the writing test draws attention to

  I ELT S

  a series of concerns and suggests that should undertake much more research in its efforts to improve the reliability and validity of the test. As the title of her article points out, what she presents is a ‘critical review’, ‘critical’ here seeming to mean ‘given to judging; given to adverse or unfavourable criticism and fault-finding’ (Oxford English Dictionary online) rather than the more balanced ‘involving or exercising careful judgement or observation’ ª ª E LT Journal Volume 64/3 July 2010; doi:10.1093/elt/ccp054 321

  (Oxford English Dictionary online), or indeed ‘critical’ as tied to ideas of societal (and educational) transformation and the illumination of power relations.

  Thus, my reaction to Uysal’s article was mixed. As an English language teacher and as a test user of

  I E LT S scores at a receiving institution to assist

  with admissions on to a British university programme, I am a stakeholder in the test. The concerns Uysal raises surrounding test validity and reliability clearly matter, and her article also clearly acknowledges potential difficulties for test takers who are often little heard.

  However, the paper was also frustrating. Whilst it is appropriate to call for

  I ELT S to be aware of and attend to key issues, the evidence strongly

  suggests that it is and does (indeed, Uysal drew largely on

  I E LT S ’ own

  research throughout her article). Furthermore, as Uysal points out, ‘there is no perfect test that is valid for all purposes and uses; test users also have the responsibility to make their own research efforts to make sure that the test is appropriate for their own institutional or contextual needs’. Thus, the concerns raised are those faced by all similar international language tests (for example T O E F L , Cambridge certificate exams,

  TOEIC ). The overly critical tone of the paper, focusing only on

  I E LT S

  rather than the wider context of language tests and testing, does Uysal’s argument a disservice. The value and interest of many of her points are diminished by a feeling that she is almost deliberately failing to recognize the difficulties inherent in test writing in our complex social and political world.

  I E LT S

  My contention, then, is that is something of an easy target, perhaps even a scapegoat, for individual and institutional difficulties within English- medium Higher Education (HE) and related E LT / E A P provision, whilst more significant and difficult issues concerning the social and political character of high-stakes, international language testing, and the nature of power relations in the contemporary, globalized academic world are left unexamined. This broader context needs to be reintroduced if the challenges facing all stakeholders in English-medium HE are to be fully understood, not only by English language test designers but also students, English language teachers, test users (i.e. receiving institutions), and policy makers.

  It is absolutely legitimate for Uysal to raise key questions in order that all stakeholders in

  I E LT S think deeply about test design and what test scores

  actually mean; however, it is not my intention, nor is there enough space in this article, to deal individually with every issue she raises (although I will deal with several in the course of the discussion). Rather, I aim to highlight the way the case is presented, which seems unduly critical of

  I E LT S whilst

  failing to examine the broader context of language testing and test development generally and power relations within internationalized English-medium HE.

  I E LT S

Examining reliability: Uysal’s criticisms of the writing test can be broadly summarized as

the example of

  IELT S follows:

  marking processes

  reliability issues, for example marking processes and rater consistency

  n validity issues including the definition and understanding of Englishes in

  n

  the world and contrastive rhetorical conventions. Within her discussion of reliability, a typical point made by Uysal concerns

  I E LT S

  the absence of multiple markers on the test, quoting Hamp-Lyons (1990: 79) in support of her case—‘all reputable writing assessment programmes use more than one reader to judge essays’. What Uysal omits from her argument, however, is Hamp-Lyons’ subsequent discussion.

  IELTS

  Quoting Alderson, Hamp-Lyons notes that has only one reader due to the difficulty of finding qualified readers in British Council locations around the world and the demand for immediate reporting of results. However, more interestingly, she goes on to question the rationale generally given for multiple scoring—that ‘multiple judgements lead to a final score which is closer to a ‘true’ score than any single judgement’ (Hamp-Lyons 1990: 79). Hamp-Lyons suggests that when two readers reach very different judgements, the two scores are averaged, with the final score bearing little resemblance to the actual scores assigned. Or, where a third reader is brought in, one of the original reader’s scores will be discounted, which may seem unproblematic until one realizes that the discounted score comes from a trained reader whose scores for other scripts are being treated as valid. McNamara (2000) further complicates this picture by noting that ‘rating remains intractably subjective’ (p. 37) and observing that ‘a score is not a score is not a score’ (p. 55), i.e. even when two raters give the same score, this might not mean the same thing to each. This is not to suggest that multiple scoring is necessarily inadequate or that single scoring is unproblematic and should not be questioned. Rather, it is to point out that, when it comes to language testing, the issues are more complex than Uysal portrays in her argument.

  Despite this complexity, and to avoid charges of complacency, it is worth

  I ELT S

  examining how tries to ensure its single-rater system is as effective as possible. Blackhurst (2004: 18) summarizes the process as follows: Reliability of rating is assured though the face-to-face training and certification of examiners and all examiners must undergo a re-training and re-certification process every two years. Continuous monitoring of the reliability of

  I E LT S Writing and Speaking assessment is achieved

  through a sample monitoring process. Selected centres worldwide are required to provide a representative sample of examiners’ marked tapes and scripts such that all examiners working at a centre over a given period are represented. Tapes and scripts are then second-marked by a team of

  

I ELT S Senior Examiners. Senior Examiners monitor for quality of both

  test conduct and rating, and feedback is returned to each centre. Analysis of the paired Senior Examiner-Examiner ratings from the sample monitoring data in 2003 produced an average correlation of .91 for the Writing module.

  Similarly, Falvey and Shaw (2006) note that the issue of the training and standardizing of writing raters is vitally important and outline the training and retraining of all examiners. Again, this is not to suggest that Uysal is wrong in her concerns (one might, for example, quibble with Blackhurst’s use of the word ‘assured’ in the above quotation), but it does highlight that

  IE LT S appears to be as rigorous as possible given the global context within

  which the test is designed, delivered, and marked. Further examples of Uysal’s overly critical approach to test design can be found elsewhere in her article. For example, she writes:

  Although there have been several publications evaluating [international high-stakes language] tests in general, these publications often do not offer detailed information about specifically the writing components of these tests. Scholars, on the other hand, acknowledge that writing is a very complex and difficult skill both to be learned and to be assessed, and it is central to academic success especially at university level. (My emphasis)

  Narrowing this claim to the context of the

  I E LT S writing test, which was the

  specific focus of Uysal’s article, two issues can be identified. Firstly, that there is little detailed information about the writing test. However, Falvey and Shaw (2006), Banerjee, Franceschina, and Smith (2007), and Shaw (2007), to list but a few, all deal explicitly with the written element of the

  IE LT S test—how the test was developed, what language is expected for

  differing indicators, and how raters score tests. Secondly, Uysal seems to be suggesting that international language tests (including

  I ELT S ) do not

  recognize the complexity of writing (but scholars do) and is implicitly separating the development of the tests from scholars. However, surveying the academic literature about testing in general and

  I E LT S in particular, it

  seems that the two are inseparable and that Uysal is creating a somewhat artificial division between

  I E LT S and research in order to suggest that the

  IE LT S

  test is somehow under-researched. The annual Research Reports and

  I ELT S

  the online quarterly Research Notes produced by , authored by researchers into test writing who are also test designers, suggest this is not the case. Throughout her article, then, Uysal usefully draws our attention to the problems and difficulties of test design. However, where I differ with her is the implicit suggestion that these are issues of particular relevance to

  I E LT S

  rather than all testing bodies and that

  IELTS fails to recognize these

  difficulties and act upon them. This seems overly critical and fails to recognize the complexities and uncertainties inherent in language test design.

  Validity issues: Uysal’s discussion of the validity of the

  I E LT S writing test opens a door

broadening the on the wider context of international high-stakes English language testing

debate and test design and its place in globalized, market-driven HE. While it is

  possible to identify with several of her points, the analysis falls short of recognizing the social, political, and economic forces which underpin her argument.

  The nature of Uysal questions several elements affecting the validity of the

  I ELT S writing

language test design test. She suggests that the test’s task types do not match target domain tasks

  and notes that ‘coherence and cohesion’ have replaced ‘communicative quality’ within assessment criteria. These are very much the ‘good questions’ to which McNamara (2000) suggests all tests are vulnerable. However, Uysal’s account of these validity issues is problematic, in that she

  I ELT S

  treats the test as a fixed entity, where difficulties and flaws are in-built and will remain ignored and unresolved by test designers. In fact, test design is a dynamic and ongoing process in which testing organizations,

  I E LT S

  including , continually engage to revise and refine a test’s ‘fitness for

  

I ELT S

  purpose’. Indeed, in ’ own research concerning test design and revision, Falvey and Shaw (2006: 8) note: The project can never be described as a one-shot effort. The revision of a high-stakes examination should never be approached by means of a monolithic exercise without the opportunity to go back, to seek further insights and to be willing to adapt during the process by revising, amending or rejecting previous decisions in the light of further data analysis.

  An international test A key element of Uysal’s discussion is the development of

  IE LT S as an

of English international English test. I strongly sympathize with her argument which

. . . a test of

international acknowledges and values ‘English as an International language (EIL) varieties’

English? (Jenkins 2006) and identifies the difficulties speakers of such Englishes

face in terms of language testing and prescriptive language standards.

  There is also value in the arguments surrounding ‘world rhetorics’. Overall, however, Uysal’s claims do not acknowledge the wider context which

  I E LT S both operates within and contributes to. Thus, we need to ask

  I E LT S

  how far the test is ‘an international test of English’, how far it is ‘a test

  I ELT S of international English’, and the extent to which claims each role.

  The first of these questions is uncontested. Taken in 120 countries and with

  I E LT S

  approaching 500,000 test takers each year, provides a test with

  IELTS

  obvious international reach and influence. But does test international English, and how far does it claim to do so? Here, the picture is less clear. The current

  I E LT S logo suggests the organization engages with

  ‘English for International Opportunity’, which is not quite the same thing as ‘international English’. Critical discourse analysts might also detect interesting ideological constructs behind the

  IELTS ’ website bylines ‘The

  world speaks IELTS ’ and ‘The test that sets the standard’. Whilst Taylor (2002) recognizes the need for

  I ELT S to account for language

  variation, in her discussion of ‘guiding principles’ to determine the content and linguistic features to be tested by

  I E LT S , she proposes the notion of the

  ‘dominant host language’ and notes: In the case of

  I ELT S , which is an international test of English used to

  assess the level of language needed for study or training in English speaking environments . . . test material is written by a trained group of writers in the UK, Australia and New Zealand . . . [reflecting] the fact that different varieties of English are spoken in the contexts around the world in which I ELT S candidates are likely to find themselves. (p. 19–20)

  I E LT S

  Uysal’s claim that aims to assess international English (but does not

  I ELT S

  do so) is therefore questionable, as has positioned itself slightly differently with regard to English language and English-medium education.

  IE LT S

  does not hide its role as an English language gatekeeper ‘for people who intend to study or work where English is the language of communication’ (

  I E LT S Homepage). The test functions, in part, to assist a range of stakeholders, such as prospective students, university lecturers, English-medium HE institutions, etc.

  Furthermore, the British Council, a major stakeholder in the test as

  I E LT S

  a co-manager of , additionally works ‘to strengthen the UK’s position within the international education community’ (British Council website 2009). The relevance of Davies’ perspective, noted at the start of this article, becomes increasingly clear—‘tests are inevitably political since what they do . . . is to sort and select to meet society’s needs’ (Davies 2003: 361).

  The

  I E LT S test is thus embedded within global relations in the

  contemporary academic world and, consequently, serves both to deliver and reinforce discourses which support native-speaker language norms. Uysal is therefore correct to suggest that

  I E LT S might promote conformity and

  homogeneity in both linguistic forms and rhetorical conventions in academia, although we need to acknowledge that this is, in part, a consequence of its purpose set out by other institutions, academic discourses, and societal and political contexts. However,

  I E LT S is not alone

  in acting as a gatekeeper for English language; most, if not all, international English language tests fulfil something of a similar role.

  I E LT S The

  IELT S context: The above discussions indicate that the issues faces and the contexts

  IELTS a summary within which the test has been developed and administered are typical of those faced by all international tests and testing organizations.

  I have argued that, given its social context and stated purpose, the test has been and is being developed with due skill and rectitude. Chalhoub-Deville and Turner (2000: 537) note that:

  Developers of large-scale tests such as those reviewed in the present article have the responsibility to: construct instruments that meet professional standards; continue to investigate the properties of their instruments and the ensuing scores; and make test manuals, user guides and research documents available to the public. Their paper generally indicates that

  I E LT S follows these practices and notes

  that ‘

  IELTS ’ commitment to research and its responsiveness to research

  findings is well documented in the literature . . . as such,

  I E LT S has shown

  commitment to test practices informed by research findings’ (Chalhoub- Deville and Turner 2000: 533).

  The broader context: Thus far, I have noted that

  I ELT S grades test takers as effectively as beyond

  IELT S possible and also that the test operates within a context of global, market- driven HE. The implications of this latter statement require further investigation.

  I E LT S

  Clearly, provides an indication of test takers’ language proficiency, but the final ‘sorting and selecting’ of test takers and applicants to English- medium HE institutions is actually undertaken by the receiving institutions themselves, and it is these test users who should consider a range of factors

  I ELT S

  in addition to test scores when considering who to admit on to university programmes. This point is made explicitly by

  I E LT S and

  elsewhere:

  Receiving organisations should also consider a candidate’s

  I ELT S results

  in the context of . . . motivation, educational and cultural background, first language and language learning history. (

  I E LT S website 2007)

  The ultimate responsibility for appropriate test use and interpretation lies predominantly with the test-user . . . (Chalhoub-Deville and Turner 2000: 113)

  I E LT S

  Whether all receiving organizations of the test duly consider these points is open to question. In the drive to recruit higher fee-paying overseas students to British and North American universities, ‘there is evidence to suggest that in some institutions, proficiency test scores are used in somewhat cruder fashion as . . . ‘‘pass-fail indicators’’’ (Brindley and Ross 2001: 150).

  It seems possible, then, that some receiving organizations need to take more responsibility for understanding and interpreting

  I E LT S test scores and

  take much more responsibility for ensuring that arriving international students can meet the requirements of the programmes they enrol for, something even the most valid and reliable language test cannot ensure. At present,

  I E LT S test scores offer an easy short cut when decisions are made

  concerning admissions to English-medium HE institutions, and consequently, it is convenient to find fault with

  I E LT S when students encounter difficulties on their chosen programmes of study.

  However, there is a broader point to be made about

  I ELT S test user

  responsibility. The default position of most Western universities is that, for a British degree, standard inner circle English varieties are the only acceptable forms of language. Indeed,

  There will be little incentive to modify . . . to suit the needs of the foreign student minority; and there will be (often justified) appeals to ‘maintaining academic standards’ whenever such suggestions are raised. Inevitably it will be the foreign students, not the system, that must make the necessary adjustments. (Adapted from Ballard 1996: 149)

  However, how sustainable is this position? Given that foreign students are essential to Western HE’s continued development and financial strength, to the extent that British universities now talk about ‘the internationalised curriculum and campuses’, receiving organizations need to engage much more deeply in a critical debate over language standards and consider the case for EIL varieties. At present, then, it is English-medium HE institutions, as much as

  I E LT S , who sustain the position of inner circle English varieties in international high-stakes testing.

  Conclusions My response to Uysal’s review of the

  I ELT S writing test has essentially

  developed two parallel strands of argument. Firstly, I have noted that, whilst questioning test provider’s practices such as test reliability and validity is a valid and valuable exercise, the published literature suggests that

  I ELT S is

  aware of and attempting to address problematic issues as effectively as possible in the normal course of test development and review. However, Uysal’s paper touched on but did not pursue a range of more

  E LT

  complex issues which those engaged in and English-medium HE need to consider—that is, the social, economic, and political dimensions of international high-stakes English language testing. Test-receiving institutions need to develop clearer understandings not only of the uncertainties inherent in language testing but also develop appropriate responses to the emergence and development of varieties of English.

  These are critical (and ethical) concerns for all stakeholders in

  Davies, A. 2003. ‘Three heresies of language testing research’. Language Testing 20/4: 355–68. Falvey, P. and S. D. Shaw. 2006. ‘

  Taylor, L. 2002. ‘Assessing learner’s English: but whose/which English(es)?’. University of Cambridge E SO L Examinations Research Notes 10: 18–20.

  E S O L Examinations Research Notes 27: 14–9. Available at: accessed 15 April 2009.

  McNamara, T. 2000. Language Testing (Oxford Introductions to Language Study Series) Oxford: Oxford University Press. Oxford English Dictionary Online. Available at: accessed 16 April 2009. Shaw, S. 2007. ‘Modelling facets of the assessment of writing within an E SM environment’. University of Cambridge

  E I L : a testing time for testers’. ELT Journal 60/1: 42–50.

  Hamp-Lyons, L. 1990. ‘Second language writing: assessment issues’ in B. Kroll (ed.). Second Language Writing: Research Insights for the Classroom. Cambridge: Cambridge University Press. Jenkins, J. 2006. ‘The spread of

  IELTS Homepage. Available at: accessed 16 April 2009.

  Examinations Research Notes 23: 7–12. Available at: accessed 15 April 2009.

  University of Cambridge E SO L

  I E LT S writing: revising assessment criteria and scales (phase 5)’.

  I E LT S , and T O E F L’ . System 28/4: 523–39.

  I E LT S when

  E S L admission tests. Cambridge certificate exams,

  Research Notes 18: 18–20. Available at: accessed 15 April 2009. Brindley, G. and S. Ross. 2001. ‘ E A P assessment: issues, models and outcomes’ in J. Flowerdew and M. Peacock (eds.). Research Perspectives on English for Academic Purposes. Cambridge: Cambridge University Press. British Council. Available at: accessed 16 April 2009. Chalhoub-Deville, M. and C. E. Turner. 2000. ‘What to look for in

  I E LT S test performance data 2003’. University of Cambridge ES O L Examinations

  Blackhurst, A. 2004. ‘

  IELT S Research Reports 7. London: British Council.

  I E LT S band score levels’.

  References Ballard, B. 1996. ‘Through language to learning: preparing overseas students for study in Western universities’ in H. Coleman (ed.). Society and the Language Classroom. Cambridge: Cambridge University Press. Banerjee, J., F. Franceschina, and A. M. Smith. 2007. ‘Documenting features of written language production typical at different

  Final revised version received May 2009

  acknowledging: the complex responsibilities of the language tester and test users as the agents of political, commercial and bureaucratic forces. (Adapted from McNamara 2000: 24)

  Available at: accessed 16 April 2009. The author Graham Hall has taught English in Europe, the Middle East, and the UK and is now a Senior Lecturer in the Division of English and Creative Writing at Northumbria University where he coordinates and teaches on Northumbria’s MA Applied Linguistics for T E S O L programme. He also teaches English Language Studies to undergraduates. Email: g.hall@northumbria.ac.uk

  

Copyright of ELT Journal: English Language Teachers Journal is the property of Oxford University Press / UK

and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright

holder's express written permission. However, users may print, download, or email articles for individual use.

Dokumen yang terkait

Using Supply Chain Operation Reference Model and Failure Mode and Effect Analysis to Measure Delivery Performance of a Distribution System (Case Study: Lotte Mart Indonesia)

0 1 9

0| S e n i B u d a y a – mengidentifikasi keunikan gagasan dan teknik dalam karya seni rupa terapan

0 42 20

A Nationalist Human Resource as a Vital Asset for Indonesia’s Developmen

0 0 15

REFUGEES’ CRISIS AND EUROPEAN UNION: A MECHANICAL INTEGRATIVE BARGAIN Loïc Charpentier Free University of Brussels Email: Loiccharpgmail.com Abstrak - Refugees’ Crisis and European Union: a Mechanical Integrative Bargai

0 0 17

Contesting Global Civil Society’s Legitimacy Claims: Evaluating International Non-Governmental Organizations (INGOs)’ Representation of and Accountability to Beneficiar

0 0 12

UNDERSTANDING THE BODY OF CHRIST: A REVIEW ARTICLE ON ROMAN CATHOLIC CHURCH IN INTERNATIONAL RELATIONS Michael J. Kristiono Department of International Relations, Universitas Indonesia Email: mj.kristionogmail.com Abstrak - Understanding the Body of Chris

0 0 13

THE PHILOSOPHICAL WORTH OF ‘LIBERAL’ PEACEBUILDING Muhammad Waffaa Kharisma School of Sociology, Politics and International Studies, University of Bristol Email: waffaa.kharismagmail.com Abstrak - The Philosophical Worth of Liberal Peacebuilding

0 2 15

The Impact of Democratization and International Exposure to Indonesian Counter-Terrorism

0 0 18

The Grammatical Accuracy, Cohesion and Coherence of Thai Students’ English Writing at Darawithaya School, Narathiwat – Thailand

0 0 25

Conceptions of early leaving a comparison of the views of teaching staff and.pdf

0 1 14