A Critical Review of The Ielts Writing Test

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

1

Suggested Citation: Uysal, H. H. (2010). A critical review of the IELTS writing test. ELT
Journal, 64(3), 314-320.

A CRITICAL REVIEW OF THE IELTS WRITING TEST


Abstract: Being administered at local centres throughout the world in 120 countries,
IELTS is one of the most widely used large-scale ESL tests that also offers a direct
writing test component. Because of its popularity and its use for making critical
decisions about test takers, the present article finds it crucial to draw attention to
some issues regarding assessment procedures of IELTS. Therefore, the present paper
aims to provide a descriptive and critical review of the IELTS writing test by focusing
particularly on various reliability issues such as single marking of papers, readability
of prompts, comparability of writing topics, and validity issues such as the definition
of the “international writing construct,” without considering variations among
rhetorical conventions and genres around the world. Consequential validity-impact
issues will also be discussed and suggestions will be given for the use of IELTS
around the world and for future research to improve the test.

Keywords: IELTS, large-scale ESL testing, standardized writing tests, validity,


reliability, testing English as an international language.

Introduction

Large scale ESL (English as a second language) tests such as Cambridge certificate
exams, International English Language Testing System (IELTS), and the test of
English as a foreign language (TOEFL) are widely used around the world and they
play an important and critical role in many people’s lives as they are often used for
making critical decisions about test takers such as admissions to universities.
Therefore, it is necessary to address the assessment procedures of such large-scale
tests on a regular basis to make sure that they meet professional standards and to
contribute to their further development. However, although there have been several
publications evaluating these tests in general, these publications often do not offer
detailed information about specifically the writing component of these tests. Scholars,
on the other hand, acknowledge that writing is a very complex and difficult skill both
to be learned and to be assessed, and it is central to academic success especially at
university level. For that reason, the present article aims to focus on merely the
assessment of writing and particularly in IELTS test because first IELTS is one of the
most popular ESL tests throughout the world and second, IELTS is unique among
other tests in terms of its claims to assess “English as an international language,”
indicating a recognition of the expanding status of English. After a brief summary
about the IELTS test in terms of its purpose, content, and scoring procedures; the
article aims to discuss several reliability and validity issues about IELTS writing test
to be considered by both language testing researchers and test-users around the world.
2

General Information about the IELTS Writing Test

IELTS writing test is a direct test of writing in which tasks are communicative and
contextualized with a specified audience, purpose and genre, reflecting the recent
developments in writing research. There is no choice of topics; however, IELTS states
that it continuously pre-tests the topics to ensure comparability and equality. IELTS
has both academic and general training modules consisting of two tasks per module.
In the academic writing task, for Task 1, candidates write a report of around 150
words based on a table or diagram and for Task 2; candidates write a short essay or
general report of around 250 words in response to an argument or a problem. In
general training writing, in Task 1, candidates write a letter responding to a given
problem, and in Task 2, they write an essay in response to a given argument or
problem. Both academic and general training writing tasks take 60 minutes. The
academic writing component serves the purpose of making decisions for university
admission of international students, whereas general writing serves the purposes of
completing secondary education, undertaking work experience or training, or meeting
immigration requirements in an English speaking country.

Trained and certified IELTS examiners assess each writing task independently giving
more weight to Task 2 in marking than Task 1. At the end, writing scores along with
other scores from each module of the test are averaged and rounded to produce an
overall band score. However, how these descriptors are turned into band scores is kept
confidential. There is no pass/fail cut scores in IELTS. Detailed performance
descriptors have been developed which describe written performance at the 9 IELTS
bands and results are reported as whole and half bands. IELTS provides a guidance
table for users on acceptable levels of language performance for different programs to
make academic or training decisions; however, IELTS advises test users to decide for
their own acceptable band scores in the light of their experiences and local needs.

Reliability Issues
Hamp-Lyons (1990) defines the sources of error that reduce the reliability in a writing
assessment as the writer, task, and raters as well as the scoring procedure. IELTS has
put forth some research efforts to minimize such errors as well as the scoring
procedure and to prove that acceptable reliability rates are achieved.

In terms of raters, IELTS states that reliability is assured through training and
certification of raters. Writing is single marked locally and rater reliability is
estimated by subjecting a selected sample of returned scripts to a second marking by a
team of IELTS senior examiners. Shaw (2004, p. 5) reported that the inter-rater
correlation was approximately 0.77 for the revised scale and g-coefficients of 0.84-
0.93 for the operational single-rater condition. Blackhurst (2004) also found that the
paired examiner-senior examiner rating from the sample IELTS writing test data
produced an average correlation of .91. However, despite the reported high reliability
measures, in such a high stakes international test, single marking is not adequate. It is
widely accepted in writing assessment that multiple judgments lead to a final score
that is closer to a true score than any single judgment (Hamp-Lyons, 1990). Therefore,
multiple raters should rate the IELTS writing tests independently and inter- and intra-
rater reliability estimates should be constantly calculated to decide about the
reliability and consistency of the writing scores.
3

IELTS also claims that the use of analytic scales contributes to higher reliability as
impressionistic rating and norm referencing are discouraged, and greater
discrimination across bands is achieved. However, Mickan (2003) addressed the
problem of inconsistency in ratings in IELTS exams and found that it was very
difficult to identify specific lexicogrammatical features that distinguish different
levels of performance. He also discovered that despite the use of analytic scales,
raters tended to respond to texts as a whole rather than to individual components.
Falvey and Shaw (2006), on the other hand, found that raters tend to adhere to the
assessment scale step by step – beginning with task achievement then moving to the
next criterion. Given the controversial findings about rater behaviour while using the
scales, more detailed information about the scale and about how raters reach scores
from analytical categories should be documented in more detail to confirm IELTS’
claims about the analytic scales.

IELTS pre-tests the tasks to ensure they conform to the test requirements in terms of
content and level of difficulty. O’Loughlin and Wigglesworth (2003) investigated
task difficulty in Task 1 in IELTS academic writing and found differences among
tasks in terms of the language used. It was found that the simpler tasks with less
information elicited higher performance and more complex language from
responders of all proficiency groups. Mickan, et al. (2000), on the other hand,
examined the readability of test prompts in terms of discourse and pragmatic
features and the test taking behaviours of test takers in the writing test, and found
that the purpose and lexico-grammatical structures in the prompts influenced the
task comprehension and writing performance.

IELTS also states that topics or contexts of language use, which might introduce a
bias against any group of candidates of a particular background, are avoided.
However, many scholars highlight that controlling the topic variable is not an easy
task as it is highly challenging to determine a common knowledge base that can be
accessed by all students from culturally diverse backgrounds and who might have
varied reading experiences of the topic or content area (Kroll and Reid, 1994). Given
the importance of the topic variable on writing performance and the difficulty of
controlling it in such an international context, continuous research on topic
comparability and appropriateness should be carried out by IELTS.

The research conducted by IELTS has been helpful to understand some variables that
might affect the reliability and accordingly the validity of the scores. As indicated by
research, different factors interfere with the consistency of the writing test in varying
degrees. Therefore, more research is necessary especially in the areas of raters, scale,
task, test taker behaviour, and topic comparability to diagnose and minimize sources
of error in testing writing. Shaw (2007) suggests the use of Electronic script
management data (ESM) in further research to understand various facets and
interactions among facets which may have a systematic influence on scores.

Validity Issues
IELTS makes use of both expert judgments by academic staff from the target domain
and empirical approaches to match the test tasks with the target domain tasks, and to
achieve high construct representativeness and relevance. Moore and Morton (1999),
4

for example, compared IELTS writing task items with 155 assignments given in two
Australian universities. They found that IELTS task 1 was representative of the TLU
content while IELTS task 2, which require students to agree or disagree with the
proposition, did not match exactly with any of the academic genres in the TLU
domain as the university writing corpus was based on external sources as opposed to
IELTS task 2, which was based on prior knowledge as a source of information. IELTS
task 2 was more similar to non-academic public forms of discourse such as letter to
the editor; however, IELTS task 2 could also be considered close to the genre “essay”,
which was the most common of the university tasks (60 %). In terms of rhetorical
functions, the most common function in the university corpus was “evaluation”
parallel to IELTS task 2. As a conclusion, it was suggested that an integrated reading-
writing task should be included in the test to increase authenticity. Nevertheless,
IELTS’ claims are based on the investigation of TLU tasks from only a limited context
—UK and Australian universities; thus, representativeness and relevance of the
construct and meaningfulness of interpretations in other domains are seriously
questionable.

In terms of the constructs and criteria for writing ability, general language construct in
IELTS is defined both in terms of language ability based on various applied linguistics
and language testing models and in terms of how these constructs are operationalized
within a task-based approach. Task 1 scripts in both general and academic writing are
assessed according to task fulfilment, coherence, cohesion, lexical resource, and
grammatical range and accuracy criteria. Task 2 scripts are assessed on task response
(making arguments), lexical resource, and grammatical range and accuracy criteria.
However, according to Shaw (2004), the use of the same criteria for both general and
academic writing modules is problematic and this application was not adequately
supported by scientific evidence. In addition, with the new criteria that have been in
use since 2005, the previous broad category “communicative quality” has been
replaced by “coherence and cohesion,” causing rigidity and too much emphasis on
paragraphing (Falvey and Shaw, 2006). Therefore, it seems like traditional rules of
form rather than meaning and intelligibility have recently gained weight in construct
definitions of IELTS.

IELTS also claims that it is an international English test. At present, the claim of
IELTS as an international test is grounded in the following issues (Taylor, 2002).
1. Reflecting social and regional language variations in test input in terms of
content and linguistic features, such as including various accents;
2. Incorporating an international team (UK, Australia, and New Zealand) who is
familiar with the features of different varieties into test development process;
3. Including NNS raters as well as NS as examiners of oral and written tests.

However, the English varieties that are considered in IELTS include merely the
varieties in the inner circle. Except for the inclusion of NNS raters in the scoring
procedure, the attempts of IELTS to be considered as an international test of English
are very limited and narrow in scope. As an international English language test,
IELTS acknowledges the need to account for language variation within the model of
linguistic or communicative competence (Taylor, 2002); however, its construct
definition is not any different from other language tests. If IELTS claims that it
assesses International English, it should include international language features in its
5

construct definition and provide evidence to support that IELTS can actually measure
English as an international language.

In addition, Taylor (2002) suggests that besides micro-level linguistic variations,


macro-level discourse variations may occur across cultures. Therefore, besides
addressing the linguistic varieties in English around the world --World Englishes--
IELTS writing test should also consider the variations among rhetorical conventions
and genres around the world --World Rhetorics-- while defining the writing construct
especially related to the criteria on coherence, cohesion and logical argument.
Published literature presents evidence that genre is not universal, but culture specific;
and people in different parts of the world differ in terms of their argument styles and
logical reasoning, use of indirectness devices, organizational patterns, the degree of
responsibility given to the readers, and rhetorical norms and perceptions of good
writing. Because especially the ability to write an argumentative essay, which is used
in IELTS writing test, is found to demonstrate unique national rhetorical styles across
cultures, IELTS corpus database should be used to find common features of
argumentative writing that are used by all international test takers to describe the
international argumentative writing construct (Taylor, 2004). This is especially
important as UCLES plans to develop a common scale for L2 writing ability in the
near future.

It is also important for IELTS to consider these cultural differences in rater training
and scoring. Purves and Hawisher (1990), based on their study on an expert rater
group, suggest that culture-specific text models also exist in readers’ heads and they
form the basis for the acceptance and appropriateness of written texts, and affect the
rating of student writing. For example, differences between NS and NNS raters were
found in terms of their evaluations regarding topics, cultural rhetorical patterns, and
sentence-level errors (Kobayashi and Rinnert, 1996). Therefore, it is also crucial to
investigate both NS and NNS raters’ rating behaviours with relation to test-taker
profile.

In terms of consequences, the impact of IELTS on the content and nature of classroom
activity in IELTS classes, materials, and on the attitudes of test users and test takers
has been investigated. However, these are not enough. IELTS should also consider the
impact of IELTS writing test in terms of the chosen standards or criteria on the
international communities in a broader context. Considering IELTS’ claims to be an
international test, the judgment of written texts of students from various cultural
backgrounds according to one writing standard based on Western writing norms may
not be fair. Taylor (2002) states that people who are responsible for language
assessment should consider how language variation affects the validity, reliability, and
impact of the tests, and should provide a clear rationale for why they include or
exclude more than one linguistic variety and where they get their norms from.

As for the washback effects of IELTS, at present, it is believed in the academic world
that international students and scholars must learn Western academic writing so that
they can function in the Anglo-American context. This view, in a way imposes
Western academic conventions on all the international community, showing no
acceptance for other varieties. According to Kachru (1997) however, this may cause
the replacement of all rich creative national styles in the world by the Western way of
writing one by one. This view is reflected in most tests of English as well. However,
6

because IELTS claims to be an international test of English, it should promote


rhetorical pluralism and raise awareness of cultural differences in rhetorical
conventions rather than promoting a single Western norm of writing as pointed out by
Kachru (1997). Therefore, considering the high washback power of IELTS,
communicative aspects of writing rather than the strict rhetorical conventions should
be emphasized in the IELTS writing test.

Conclusion

To sum up, IELTS is committed and has been carrying out continuous research to test
its reliability and validity measures and to improve the test further. However, some
issues such as the fairness of using a single prescriptive criterion on international test
takers coming from various rhetorical and argumentative traditions and the necessity
of defining the writing construct with respect to the claims of ELTS to be an
international test of English have not been adequately included in these research
efforts. In addition, some areas of research on the reliability of test scores point out
serious issues that need further consideration. Therefore, the future research agenda
for IELTS should include the following issues:

In terms of reliability, the comparability and appropriateness of prompts and tasks for
all test takers should be continuously investigated; multiple raters should be included
in the rating process and inter-and intra-rater reliability measures should be constantly
calculated; more research regarding scales, how scores are rounded to a final score;
and rater behaviour while using the scales should be conducted. IELTS has rich data
sources such as ESM in hand; however, so far this source is not fully tapped to
understand interactions among the above-mentioned factors with relation to test-taker
and rater profile.

In terms of improving the validation efforts with regard to the IELTS writing module,
future research should be performed to explore whether the characteristics of the
IELTS test tasks and the TLU tasks match, not only in the domain of UK and
Australia, but also in other domains. Cultural differences in writing should be
considered both in the construct definitions and rater training efforts. Research in
respect to determining the construct of international English ability and international
English writing ability should also be conducted by using the already existing corpus
of IELTS, and consequences of the assessment practices and criteria in terms of power
relationships in the world context should also be taken into consideration. However,
given that there is no perfect test that is valid for all purposes and uses; test users also
have the responsibility to make their own research efforts to make sure that the test is
appropriate for their own institutional or contextual needs.

References
Blackhurst, A. 2004. ‘IELTS test performance data 2003.’ Research notes, 18, 18-20.
Falvey, P. and S. D. Shaw. 2006. ‘IELTS writing: Revising assessment criteria and
scales (phase 5).’ Research Notes 23, 7-12.
7

Hamp-Lyons, L. 1990. ‘Second language writing assessment issues.’ In Barbara


Kroll (Ed.) Second language writing: Research insights for the classroom. NY:
Cambridge University Press.
Kachru, Y. 1997. ‘Culture and argumentative writing in World Englishes.’ In Smith,
L. E. and Forman, M. L. Literary Studies-East and West: World Englishes 2000
selected essays (pp. 48-67). University of Hawai’i Press, Honolulu, Hawai.
Kobayashi, H. & C. Rinnert. 1996. ‘Factors affecting composition evaluation in an
EFL context: Cultural rhetorical pattern and readers’ background.’ Language
Learning, 46/3: 397-437.
Kroll, B. & J. Reid. 1994. ‘Guidelines for designing writing prompts: Clarifications,
caveats, and cautions.’ Journal of Second Language Writing, 3/3: 231-255.
Mickan, P., S. Slater, C. Gibson. 2000. ‘A study of response validity of the IELTS
Writing Module.’ IELTS Research Reports, vol: 3, paper 2. Canberra: IDP: IELTS
Australia.
Mickan, P. 2003. ‘What is your score? An investigation into language descriptors
from rating written performance.’ IELTS Research Reports, vol: 5, paper 3. Canberra:
IDP: IELTS Australia.
Moore, T. & J. Morton. 1999. ‘Authenticity in the IELTS Academic Module.
Writing Test: A comparative study of Task 2 items and university assignments.’ IELTS
Research Reports, vol: 2, paper 4. Canberra: IDP: IELTS Australia.
O’Loughlin, K. & G.Wigglesworth. 2003. ‘Task design in IELTS academic writing
Task 1: The effect of quantity and manner of presentation of information on candidate
writing.’ IELTS Research Reports, vol: 4, paper 3. Canberra: IDP: IELTS Australia.
Purves, A. and G. Hawisher. 1990. ‘Writers, judges, and text models.’ In Richard
Beach and Susan Hynds (Ed.) Developing Discourse Practices in Adolescence and
adulthood. Advances in discourse processes, Vol. 39. (pp. 183-199). NJ: Ablex
Publishing.
Shaw, S. D. 2004. ‘IELTS writing: revising assessment criteria and scales (phase 3).’
Research notes 16, 3-7.
Shaw, S. D. 2007. ‘Modelling facets of the assessment of writing within an ESM
environment.’ Research notes, 27, 14-19.
Taylor, L. 2002. ‘Assessing learner’s English: But whose/ which English(es)?’
Research Notes, 10, 18-20.
Taylor, L. 2004. ‘Second language writing assessment: Cambridge ESOL’s ongoing
research agenda.’ Research notes, 16, 2-3.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy