A CRITICAL REVIEW OF THE IELTS WRITING TEST

  Suggested Citation: Uysal, H. H. (2010). A critical review of the IELTS writing test. ELT Journal, 64(3), 314-320.

A CRITICAL REVIEW OF THE IELTS WRITING TEST

  

Abstract: Being administered at local centres throughout the world in 120 countries,

  

IELTS is one of the most widely used large-scale ESL tests that also offers a direct

writing test component. Because of its popularity and its use for making critical

decisions about test takers, the present article finds it crucial to draw attention to

some issues regarding assessment procedures of IELTS. Therefore, the present paper

aims to provide a descriptive and critical review of the IELTS writing test by focusing

particularly on various reliability issues such as single marking of papers, readability

of prompts, comparability of writing topics, and validity issues such as the definition

of the “international writing construct,” without considering variations among

rhetorical conventions and genres around the world. Consequential validity-impact

issues will also be discussed and suggestions will be given for the use of IELTS

around the world and for future research to improve the test.

  

Keywords: IELTS, large-scale ESL testing, standardized writing tests, validity,

reliability, testing English as an international language.

Introduction

  Large scale ESL (English as a second language) tests such as Cambridge certificate exams, International English Language Testing System (IELTS), and the test of English as a foreign language (TOEFL) are widely used around the world and they play an important and critical role in many people’s lives as they are often used for making critical decisions about test takers such as admissions to universities. Therefore, it is necessary to address the assessment procedures of such large-scale tests on a regular basis to make sure that they meet professional standards and to contribute to their further development. However, although there have been several publications evaluating these tests in general, these publications often do not offer detailed information about specifically the writing component of these tests. Scholars, on the other hand, acknowledge that writing is a very complex and difficult skill both to be learned and to be assessed, and it is central to academic success especially at university level. For that reason, the present article aims to focus on merely the assessment of writing and particularly in IELTS test because first IELTS is one of the most popular ESL tests throughout the world and second, IELTS is unique among other tests in terms of its claims to assess “English as an international language,” indicating a recognition of the expanding status of English. After a brief summary about the IELTS test in terms of its purpose, content, and scoring procedures; the article aims to discuss several reliability and validity issues about IELTS writing test to be considered by both language testing researchers and test-users around the world.

  General Information about the IELTS Writing Test

  IELTS writing test is a direct test of writing in which tasks are communicative and contextualized with a specified audience, purpose and genre, reflecting the recent developments in writing research. There is no choice of topics; however, IELTS states that it continuously pre-tests the topics to ensure comparability and equality. IELTS has both academic and general training modules consisting of two tasks per module. In the academic writing task, for Task 1, candidates write a report of around 150 words based on a table or diagram and for Task 2; candidates write a short essay or general report of around 250 words in response to an argument or a problem. In general training writing, in Task 1, candidates write a letter responding to a given problem, and in Task 2, they write an essay in response to a given argument or problem. Both academic and general training writing tasks take 60 minutes. The academic writing component serves the purpose of making decisions for university admission of international students, whereas general writing serves the purposes of completing secondary education, undertaking work experience or training, or meeting immigration requirements in an English speaking country.

  Trained and certified IELTS examiners assess each writing task independently giving more weight to Task 2 in marking than Task 1. At the end, writing scores along with other scores from each module of the test are averaged and rounded to produce an overall band score. However, how these descriptors are turned into band scores is kept confidential. There is no pass/fail cut scores in IELTS. Detailed performance descriptors have been developed which describe written performance at the 9 IELTS bands and results are reported as whole and half bands. IELTS provides a guidance table for users on acceptable levels of language performance for different programs to make academic or training decisions; however, IELTS advises test users to decide for their own acceptable band scores in the light of their experiences and local needs.

  Reliability Issues

  Hamp-Lyons (1990) defines the sources of error that reduce the reliability in a writing assessment as the writer, task, and raters as well as the scoring procedure. IELTS has put forth some research efforts to minimize such errors as well as the scoring procedure and to prove that acceptable reliability rates are achieved.

  In terms of raters, IELTS states that reliability is assured through training and certification of raters. Writing is single marked locally and rater reliability is estimated by subjecting a selected sample of returned scripts to a second marking by a team of IELTS senior examiners. Shaw (2004, p. 5) reported that the inter-rater correlation was approximately 0.77 for the revised scale and g-coefficients of 0.84- 0.93 for the operational single-rater condition. Blackhurst (2004) also found that the paired examiner-senior examiner rating from the sample IELTS writing test data produced an average correlation of .91. However, despite the reported high reliability measures, in such a high stakes international test, single marking is not adequate. It is widely accepted in writing assessment that multiple judgments lead to a final score that is closer to a true score than any single judgment (Hamp-Lyons, 1990). Therefore, multiple raters should rate the IELTS writing tests independently and inter- and intra-rater reliability estimates should be constantly calculated to decide about the reliability and consistency of the writing scores.

  IELTS also claims that the use of analytic scales contributes to higher reliability as impressionistic rating and norm referencing are discouraged, and greater discrimination across bands is achieved. However, Mickan (2003) addressed the problem of inconsistency in ratings in IELTS exams and found that it was very difficult to identify specific lexicogrammatical features that distinguish different levels of performance. He also discovered that despite the use of analytic scales, raters tended to respond to texts as a whole rather than to individual components. Falvey and Shaw (2006), on the other hand, found that raters tend to adhere to the assessment scale step by step – beginning with task achievement then moving to the next criterion. Given the controversial findings about rater behaviour while using the scales, more detailed information about the scale and about how raters reach scores from analytical categories should be documented in more detail to confirm IELTS’ claims about the analytic scales.

  IELTS pre-tests the tasks to ensure they conform to the test requirements in terms of content and level of difficulty. O’Loughlin and Wigglesworth (2003) investigated

  

task difficulty in Task 1 in IELTS academic writing and found differences among

  tasks in terms of the language used. It was found that the simpler tasks with less information elicited higher performance and more complex language from responders of all proficiency groups. Mickan, et al. (2000), on the other hand, examined the readability of test prompts in terms of discourse and pragmatic features and the test taking behaviours of test takers in the writing test, and found that the purpose and lexico-grammatical structures in the prompts influenced the task comprehension and writing performance.

  IELTS also states that topics or contexts of language use, which might introduce a bias against any group of candidates of a particular background, are avoided. However, many scholars highlight that controlling the topic variable is not an easy task as it is highly challenging to determine a common knowledge base that can be accessed by all students from culturally diverse backgrounds and who might have varied reading experiences of the topic or content area (Kroll and Reid, 1994). Given the importance of the topic variable on writing performance and the difficulty of controlling it in such an international context, continuous research on topic comparability and appropriateness should be carried out by IELTS. The research conducted by IELTS has been helpful to understand some variables that might affect the reliability and accordingly the validity of the scores. As indicated by research, different factors interfere with the consistency of the writing test in varying degrees. Therefore, more research is necessary especially in the areas of raters, scale, task, test taker behaviour, and topic comparability to diagnose and minimize sources of error in testing writing. Shaw (2007) suggests the use of Electronic script management data (ESM) in further research to understand various facets and interactions among facets which may have a systematic influence on scores.

  Validity Issues

  IELTS makes use of both expert judgments by academic staff from the target domain and empirical approaches to match the test tasks with the target domain tasks, and to achieve high construct representativeness and relevance. Moore and Morton (1999),

  Australian universities. They found that IELTS task 1 was representative of the TLU content while IELTS task 2, which require students to agree or disagree with the proposition, did not match exactly with any of the academic genres in the TLU domain as the university writing corpus was based on external sources as opposed to

  IELTS task 2, which was based on prior knowledge as a source of information. IELTS task 2 was more similar to non-academic public forms of discourse such as letter to the editor; however, IELTS task 2 could also be considered close to the genre “essay”, which was the most common of the university tasks (60 %). In terms of rhetorical functions, the most common function in the university corpus was “evaluation” parallel to IELTS task 2. As a conclusion, it was suggested that an integrated reading- writing task should be included in the test to increase authenticity. Nevertheless,

  IELTS’ claims are based on the investigation of TLU tasks from only a limited context—UK and Australian universities; thus, representativeness and relevance of the construct and meaningfulness of interpretations in other domains are seriously questionable.

  In terms of the constructs and criteria for writing ability, general language construct in

  IELTS is defined both in terms of language ability based on various applied linguistics and language testing models and in terms of how these constructs are operationalized within a task-based approach. Task 1 scripts in both general and academic writing are assessed according to task fulfilment, coherence, cohesion, lexical resource, and grammatical range and accuracy criteria. Task 2 scripts are assessed on task response (making arguments), lexical resource, and grammatical range and accuracy criteria. However, according to Shaw (2004), the use of the same criteria for both general and academic writing modules is problematic and this application was not adequately supported by scientific evidence. In addition, with the new criteria that have been in use since 2005, the previous broad category “communicative quality” has been replaced by “coherence and cohesion,” causing rigidity and too much emphasis on paragraphing (Falvey and Shaw, 2006). Therefore, it seems like traditional rules of form rather than meaning and intelligibility have recently gained weight in construct definitions of IELTS.

  IELTS also claims that it is an international English test. At present, the claim of IELTS as an international test is grounded in the following issues (Taylor, 2002).

  1. Reflecting social and regional language variations in test input in terms of content and linguistic features, such as including various accents;

  2. Incorporating an international team (UK, Australia, and New Zealand) who is familiar with the features of different varieties into test development process;

  3. Including NNS raters as well as NS as examiners of oral and written tests. However, the English varieties that are considered in IELTS include merely the varieties in the inner circle. Except for the inclusion of NNS raters in the scoring procedure, the attempts of IELTS to be considered as an international test of English are very limited and narrow in scope. As an international English language test,

  IELTS acknowledges the need to account for language variation within the model of linguistic or communicative competence (Taylor, 2002); however, its construct definition is not any different from other language tests. If IELTS claims that it assesses International English, it should include international language features in its construct definition and provide evidence to support that IELTS can actually measure English as an international language.

  In addition, Taylor (2002) suggests that besides micro-level linguistic variations, macro-level discourse variations may occur across cultures. Therefore, besides addressing the linguistic varieties in English around the world --World Englishes--

  IELTS writing test should also consider the variations among rhetorical conventions and genres around the world --World Rhetorics-- while defining the writing construct especially related to the criteria on coherence, cohesion and logical argument. Published literature presents evidence that genre is not universal, but culture specific; and people in different parts of the world differ in terms of their argument styles and logical reasoning, use of indirectness devices, organizational patterns, the degree of responsibility given to the readers, and rhetorical norms and perceptions of good writing. Because especially the ability to write an argumentative essay, which is used in IELTS writing test, is found to demonstrate unique national rhetorical styles across cultures, IELTS corpus database should be used to find common features of argumentative writing that are used by all international test takers to describe the international argumentative writing construct (Taylor, 2004). This is especially important as UCLES plans to develop a common scale for L2 writing ability in the near future. It is also important for IELTS to consider these cultural differences in rater training and scoring. Purves and Hawisher (1990), based on their study on an expert rater group, suggest that culture-specific text models also exist in readers’ heads and they form the basis for the acceptance and appropriateness of written texts, and affect the rating of student writing. For example, differences between NS and NNS raters were found in terms of their evaluations regarding topics, cultural rhetorical patterns, and sentence-level errors (Kobayashi and Rinnert, 1996). Therefore, it is also crucial to investigate both NS and NNS raters’ rating behaviours with relation to test-taker profile. In terms of consequences, the impact of IELTS on the content and nature of classroom activity in IELTS classes, materials, and on the attitudes of test users and test takers has been investigated. However, these are not enough. IELTS should also consider the impact of IELTS writing test in terms of the chosen standards or criteria on the international communities in a broader context. Considering IELTS’ claims to be an international test, the judgment of written texts of students from various cultural backgrounds according to one writing standard based on Western writing norms may not be fair. Taylor (2002) states that people who are responsible for language assessment should consider how language variation affects the validity, reliability, and impact of the tests, and should provide a clear rationale for why they include or exclude more than one linguistic variety and where they get their norms from. As for the washback effects of IELTS, at present, it is believed in the academic world that international students and scholars must learn Western academic writing so that they can function in the Anglo-American context. This view, in a way imposes Western academic conventions on all the international community, showing no acceptance for other varieties. According to Kachru (1997) however, this may cause the replacement of all rich creative national styles in the world by the Western way of writing one by one. This view is reflected in most tests of English as well. However, because IELTS claims to be an international test of English, it should promote rhetorical pluralism and raise awareness of cultural differences in rhetorical conventions rather than promoting a single Western norm of writing as pointed out by Kachru (1997). Therefore, considering the high washback power of IELTS, communicative aspects of writing rather than the strict rhetorical conventions should be emphasized in the IELTS writing test.

Conclusion

  To sum up, IELTS is committed and has been carrying out continuous research to test its reliability and validity measures and to improve the test further. However, some issues such as the fairness of using a single prescriptive criterion on international test takers coming from various rhetorical and argumentative traditions and the necessity of defining the writing construct with respect to the claims of ELTS to be an international test of English have not been adequately included in these research efforts. In addition, some areas of research on the reliability of test scores point out serious issues that need further consideration. Therefore, the future research agenda for IELTS should include the following issues: In terms of reliability, the comparability and appropriateness of prompts and tasks for all test takers should be continuously investigated; multiple raters should be included in the rating process and inter-and intra-rater reliability measures should be constantly calculated; more research regarding scales, how scores are rounded to a final score; and rater behaviour while using the scales should be conducted. IELTS has rich data sources such as ESM in hand; however, so far this source is not fully tapped to understand interactions among the above-mentioned factors with relation to test-taker and rater profile.

  In terms of improving the validation efforts with regard to the IELTS writing module, future research should be performed to explore whether the characteristics of the

  IELTS test tasks and the TLU tasks match, not only in the domain of UK and Australia, but also in other domains. Cultural differences in writing should be considered both in the construct definitions and rater training efforts. Research in respect to determining the construct of international English ability and international English writing ability should also be conducted by using the already existing corpus of IELTS, and consequences of the assessment practices and criteria in terms of power relationships in the world context should also be taken into consideration. However, given that there is no perfect test that is valid for all purposes and uses; test users also have the responsibility to make their own research efforts to make sure that the test is appropriate for their own institutional or contextual needs.

  References

Blackhurst, A. 2004. ‘IELTS test performance data 2003.’ Research notes, 18, 18-20.

  

Falvey, P. and S. D. Shaw. 2006. ‘IELTS writing: Revising assessment criteria and

scales (phase 5).’ Research Notes 23, 7-12.

Hamp-Lyons, L. 1990. ‘Second language writing assessment issues.’ In Barbara

  Kroll (Ed.) Second language writing: Research insights for the classroom. NY: Cambridge University Press.

  

Kachru, Y. 1997. ‘Culture and argumentative writing in World Englishes.’ In Smith,

  L. E. and Forman, M. L. Literary Studies-East and West: World Englishes 2000 selected essays (pp. 48-67). University of Hawai’i Press, Honolulu, Hawai.

  

Kobayashi, H. & C. Rinnert. 1996. ‘Factors affecting composition evaluation in an

  EFL context: Cultural rhetorical pattern and readers’ background.’ Language Learning, 46/3: 397-437.

  

Kroll, B. & J. Reid. 1994. ‘Guidelines for designing writing prompts: Clarifications,

caveats, and cautions.’ Journal of Second Language Writing, 3/3: 231-255.

Mickan, P., S. Slater, C. Gibson. 2000. ‘A study of response validity of the IELTS

  Writing Module.’ IELTS Research Reports, vol: 3, paper 2. Canberra: IDP: IELTS Australia.

  

Mickan, P. 2003. ‘What is your score? An investigation into language descriptors

  from rating written performance.’ IELTS Research Reports, vol: 5, paper 3. Canberra: IDP: IELTS Australia.

  

Moore, T. & J. Morton. 1999. ‘Authenticity in the IELTS Academic Module.

  Writing Test: A comparative study of Task 2 items and university assignments.’ IELTS Research Reports, vol: 2, paper 4. Canberra: IDP: IELTS Australia.

  

O’Loughlin, K. & G.Wigglesworth. 2003. ‘Task design in IELTS academic writing

  Task 1: The effect of quantity and manner of presentation of information on candidate writing.’ IELTS Research Reports, vol: 4, paper 3. Canberra: IDP: IELTS Australia.

  

Purves, A. and G. Hawisher. 1990. ‘Writers, judges, and text models.’ In Richard

  Beach and Susan Hynds (Ed.) Developing Discourse Practices in Adolescence and

  

adulthood. Advances in discourse processes, Vol. 39. (pp. 183-199). NJ: Ablex

Publishing.

  

Shaw, S. D. 2004. ‘IELTS writing: revising assessment criteria and scales (phase 3).’

Research notes 16, 3-7.

Shaw, S. D. 2007. ‘Modelling facets of the assessment of writing within an ESM

environment.’ Research notes, 27, 14-19.

Taylor, L. 2002. ‘Assessing learner’s English: But whose/ which English(es)?’

Research Notes, 10, 18-20.

Taylor, L. 2004. ‘Second language writing assessment: Cambridge ESOL’s ongoing

research agenda.’ Research notes, 16, 2-3.