Assessment 2 Rationale
Assessment 2 Rationale
Assessment 2 Rationale
Introduction
Language testing poses a moral dilemma, in that its intended purpose is to discriminate
discrimination (Hamps-Lyon 1989, cited in Lynch 1997, p. 315) by considering a tests validity,
reliability, washback, authenticity and practicality. This paper discusses two progress tests,
and a writing portfolio. The tests are designed for the Academic Reading and Writing (ARW)
unit, which is part of the English for Tertiary Studies (ETS) programme at UOW College. It will
be argued that this combination of objective and subjective measurements builds the ideal
relationship and thus discriminates discriminately. Firstly, the context and student cohort will
be discussed. Secondly, the ARW curriculum will be outlined. Thirdly, language constructs and
test specifications will be described. Fourthly, both tests will be evaluated in relation to the
aforementioned criteria. The assessment plan, the tests, and scoring and grading scales can
Context
UOW College, which is an integral part of University of Wollongong (UOW), provides direct
entry programmes for domestic and international students. ETS is a ten-week non-credit
carrying EGAP programme for international students planning to undertake tertiary courses
in business, medicine, law, humanities, creative arts, engineering, science, IT, or social
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 1 of 52
who not only intend to specialise in disparate disciplines upon conclusion of dissimilar
pathways, but also speak in a babel of native tongues. Students from more than thirty L1s
have participated in the programme (UOW College 2015). Future students require
evidenced by appropriate results from an accepted test. This includes either the IELTS or
TOEFL standardised test, whereby students need to achieve a minimum of an IELTS 5.5 overall
band score with modest user competence in reading and writing, or 525 in the TOEFL paper
ARW Curriculum
ETS is an integrated skills-based programme, comprising three core units: Critical Literacy;
Academic Listening and Speaking; and, Academic Reading and Writing. The ARW component
carries the majority weighting of sixty percent. Learning objectives include computer literacy;
time management; and, essay writing. The unit focusses on writing an argumentative essay,
whereby teaching and learning includes several microskills, namely rhetoric, process writing,
The present paper is based on the 2003 ETS draft syllabus and proposes two alternative
formative assessments for the ARW unit. As the argumentative essay appears to be the
principal learning objective, an assessment plan that promotes beneficial washback for
writing in this genre is proposed (Appendix A, p. 18). It entails an academic vocabulary test
and a process portfolio that includes compositions, essay plans, and writing journals. Their
purpose is to identify what students have learned and still need to learn (Brown 1998, p. 15)
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 2 of 52
in regards to mastering control of the target task, i.e. an academic argumentative essay. The
assessments fulfil a triadic function, namely to offer feedback; provide scaffolded learning;
and, assess the curriculums appropriateness (Ibid.). Both tests are criterion-referenced,
whereby criteria are based on theoretical constructs and ARW learning objectives. All
objectives are addressed, except for note-taking, time management, and writing abstracts
(Appendix A, p. 18). Before test specifications can be described, underlying constructs need
Language Constructs
The language model to be tested involves two distinct constructs, namely, argumentative
essay conventions, and the skills required to write in this genre. lvarez (2001, cited in
Bejarano and Chapetn 2013, p. 129) defines it as an interactive text, whereby the author
2012, p. 146) emphasizes the genres cohesion, in that it consists of a connected series of
statements intended to establish a position. According to Toulmin, Reike and Janik (1984,
cited in Winsgate 2012, p. 146) these statements contain claims and reasons. Furthermore, a
carefully constructed argumentation includes qualifiers, i.e. a form of hedging, and rebuttals
of counterarguments (Toulmin 1958, cited in Liu & Stapleton 2014, p. 118). In sum, the
academic argumentative essay construct is defined as a text type that aims to convince or
persuade, which is achieved through interrelated moves, including claims, reasons, qualifiers
and rebuttals. This construct formed the premise for test development and marking rubric
design.
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 3 of 52
Additionally, the design process was informed by a pragmatic amalgam of writing constructs.
Firstly, the test is informed by genre theory. Practicality considerations preclude a dedicated
genre approach, in that the context comprises a student cohort who plan to specialise in
2012, p. 147). However, Swaless move structure concept has been incorporated. This is
evidenced in the writing journals, which stipulate that students elaborate on moves and
the tests incorporate process writing concepts, which emphasize intrapersonal cognitive
skills. For instance, the first test includes an editing task (Appendix B, pp. 34 - 43), whereas
the second test requires students to develop essay plans (Appendix C, pp. 45 - 47). Thirdly,
writing from sources, which is a fundamental aspect of real-life scholarly writing (Gebril 2009,
p. 508), has been included as a criterion in the writing assignments, which is demonstrated in
the marking rubric (Appendix D, p. 49). Finally, the tests are informed by a product-oriented
approach, which emphasizes textual conventions such as paragraph structure, syntax and
lexis. This construct is evidenced in the first test, which measures academic vocabulary
(Appendix B, pp. 19 - 43), and elaborated in the second test, whereby compositions are scored
(Appendix D, pp. 48 - 51). Thus, the academic writing construct is defined as an amalgamation
of the TL domain.
whereby each component measures distinct vocabulary knowledge and abilities (Appendix B,
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 4 of 52
pp. 19 - 44). The tasks were sequenced in order to increase the skills complexity level,
progressing from recognition to production skills. The test measures high-frequency academic
vocabulary, which was selected since many EFL students perceive academic lexis as
particularly challenging (Li & Pemberton 1994, cited in Hyland & Tse 2007, p. 326). Test items
originate from Coxheads (2000) Academic Word List (AWL), which consists of 570 word
families, including 3,112 individual items (Hyland & Tse 2007, p. 327). The selection
predominantly consists of the most frequently occurring members of AWL word families,
although occasionally less frequently occurring words had to be selected for practical reasons.
Furthermore, the tests third task, which comprises authentic texts, required occasional
The first part of the vocabulary test comprises a multiple-choice (MC) instrument, which
measures the ability to recognise denotations of Latinate content words (Appendix B, pp. 20
29). All test items include four options to reduce the effect of guessing (Hughes 2003, p. 77),
which are furthermore homogenous in length and content, and contain plausible distractors.
For example, Item 8 tests recognition, or recall, of the word interpretation, whereby the
distractors include translation, inducement, and solution. The items were designed
follows:
8. This paper offers an alternative interpretation of Manchesters football history, arguing that it was a
a. to offer an explanation
b. to offer a translation
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 5 of 52
c. to offer an inducement
d. to offer a solution
the prompt is framed as a question, preceded by an authentic reading passage containing the
token in question. This format encourages students to recall principles, rules or facts in a real-
life context thus emphasizing higher-level thinking skills (Ibid.). All reading passages were
selected from authentic articles, sourced from Google Scholar and UOW library databases. A
few sentences were adapted from Swales and Feaks (2012) Academic Writing for Graduate
Students.
The second part comprises a single-word gap filling test, which focusses on accuracy and
research found that even advanced NNSs have great difficulty with native-like collocations
(Ellis, Simpson-Vlach & Maynard 2008, p. 378); however, fluent and accurate usage of
formulaic language signals competence to a given discourse community (Hyland 2008, p. 42).
The test contains discrete sentences, whereby students need to fill in the correct preposition
following a verb or noun provided. All prompts were selected from authentic texts according
The third part entails a MC editing test, which focusses on pragmatic meanings and measures
both receptive and productive abilities (Appendix B, pp. 34 - 43). This task type was selected
in order to promote proofreading skills (Brown & Abeywickrama 2010, p. 247) in preparation
of the skills required for the second assessment, i.e. the portfolio. The task contains two
separate texts, which require students to proofread each sentence; identify inappropriate
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 6 of 52
vocabulary; and, provide an accurate and appropriate alternative. It measures knowledge of
lexical phrases, discourse markers and content words. The two texts were adapted from
authentic academic articles on general knowledge topics, i.e. the resources curse and
modern slavery. As such, they do not require specialist knowledge thus cultivating fairness
The vocabulary test will be scored according to an absolute grading scale, whereby one point
is awarded for each question answered correctly. The MC test is scored dichotomously and
the gap-filling task follows an exact word-scoring procedure, whereby a point is awarded for
the correct preposition only. The MC editing task is scored according to appropriate word-
scoring and partial credit-scoring procedures. Credit is awarded for grammatically correct and
originate from the AWL. Half a point is awarded for selecting the key and another for
providing a suitable alternative. The entire test contains one hundred items and aggregate
scores map onto the grading scale (Appendix E, p. 52) as provided in the ETS syllabus, e.g. a
includes compositions, essay plans and writing journals (Appendix C, p. 45 - 47). According to
Brown and Abeywickrama (2012, p. 131), portfolio development has various pedagogical
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 7 of 52
The portfolio is based on three separate themes, or topics, whereby each requires a
composition, an accompanying writing journal, and an essay plan. Students do not have a
choice of topics and need to undertake all tasks. Each composition focusses on one aspect of
the essay macrostructure, for instance, the first composition task requires students to write
an introduction; the second requires body paragraphs; and, the third requires a conclusion.
Additionally, three writing journals are required (Appendix C, pp. 45 - 47), whereby content is
restricted to thinking and writing processes related to each specific composition, with the
language choices. The essay plans are not discussed in this paper, as they are intended to be
The test contains three separate prompts, which were designed according to Kroll and Reids
(1994) guidelines and include contemporary social issues, namely genetically-modified food;
constant public surveillance; and, human cloning. For example, Task Three (Appendix C, p.
Topic: Twenty years ago, the first mammal, Dolly the Sheep, was cloned successfully.
To date, no human clone has been born. Nonetheless, the topic is a rich source for
media and fiction narratives alike. What opportunities, or threats, do you think
human cloning presents? Relate the response to your intended field of study.
The framed prompt above does neither require specific cultural schemata nor discipline-
body of knowledge that is equally accessible to all students (Horowitz 1991, cited in Kroll &
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 8 of 52
The compositions will be scored according to an analytical scale (Appendix D, p. 48 -51), based
argumentation; textual coherence and cohesion; language choices; source writing; and,
presentation. The performance levels include four gradations, ranging from excellent,
good, pass to unsatisfactory. The rubric is scaled, with points assigned for each level of
performance, whereby the maximum score amounts to one hundred points, so that
aggregate scores translate directly into a grade as per the ETS syllabus scale (Appendix E, p.
52). The scales purpose is to provide individualised feedback and pinpoint the microskills not
mastered yet (Bloom et al. 1971, cited in Perkins 1983, p. 656). The writing journal will not be
individualised feedback in that it provides insights into students language awareness and
thinking processes. Process portfolios and writing journals will be discussed during one-on-
one conferences. When compositions have been revised, they become part of the final
presentation portfolio.
Reliability
The vocabulary test represents a relatively reliable instrument. Firstly, it includes a substantial
number of items, based on the premise that the more items there are on a test, the more
reliable it will be (Hughes 2003, p. 44). In addition, the first sixty items each provide a fresh
start thus further improving test reliability. Secondly, it entails a comparatively reliable
scoring procedure. The first two tasks represent objective tests, in that there is only one
correct answer for each question. The MC editing task is more subjective, in that multiple
alternatives might be possible, and scoring requires a judgement call on behalf of the raters.
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 9 of 52
Rater reliability could be improved by introducing a second rater for this specific task (Hughes
2003, p. 50).
Several safeguards were implemented to increase the reliability of the writing portfolio.
Firstly, it comprises three independent, untimed compositions thus preventing the snap-shot
approach, which creates unreliable and unrepresentative impressions (Hamp-Lyons & Kroll
1996, p. 53). Secondly, students do not have a choice of topics, in that too much freedom in
topics is likely to have a depressing effect on test reliability (Hughes 2003, p. 45). Thirdly,
scoring should be conducted according to a detailed marking rubric, which may increase intra-
rater reliability, prevent accidental construct under-presentation (Hughes 2003, p. 102), and
improve trustworthiness and auditability (Brown & Hudson 1998, p. 655). In sum, several
Nonetheless, both tests should be trialled to assess other plausible threats to test reliability.
Firstly, the vocabulary test is relatively lengthy. Although it is not timed, it needs to be finished
within one period. Consequently, test takers could become fatigued, which may cause
inaccurate test results (Brown & Abeywickrama 2012, p. 29). Perhaps it needs to be
administered across several periods to address this possible threat. Secondly, MC items need
to be tested to determine distractor efficiency, item facility and item discrimination. Thirdly,
unbiased and appropriate for this specific context (Kroll & Reid 1994, p. 241).
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 10 of 52
Validity
The portfolio demonstrates several types of validity. Firstly, it demonstrates face validity, in
that it entails a direct test of writing skills. Secondly, it demonstrates content validity, in that
it addresses the majority of ARW learning objectives and requires performance of the target
Contrarily, the vocabulary test may have construct validity issues in that the existence of a
common core of academic vocabulary is contested. Hyland and Tse (2007, p. 248) argue that
the AWL has variable usefulness for various disciplines since many items are
underrepresented in some fields. The AWL has most utility for IT students and least for those
studying biology (Ibid.). This suggests that caution may need to be exercised in
implementation. Notwithstanding, the AWL could be used as a threshold concept (Clarke &
readings and reflective activities. Furthermore, aside from EAP courses, students have limited
opportunities for common core vocabulary development, in that sub-technical lexis is unlikely
to be taught by content teachers (Flowerdew 1993, cited in Hyland & Tse 2007, p. 236). Thus,
a vocabulary test may have a signficant role to play by motivating students to study and
Authenticity
The process portfolio could be considered an authentic assessment, in that there is a strong
correspondence between the test and the skills required in real-life academic contexts.
Arguing a case is one of the most frequent and important types of assessment tasks set at
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 11 of 52
university (Lee 2008, p. 240), whereby logical argumentation is considered a key writing skill
(Lee & Street 1998, cited in Winsgate 2012, p. 145). Academic essays must both carry
appropriate authority and engage readers in ways they are likely to find credible and
persuasive (Hyland 2002, p. 215). However, many students demonstrate language lacks in
discussing, arguing and evaluating competently, logically and persuasively (Lee 2008, p. 240;
Winsgate 2012, p. 145). The process portfolio provides step-by-step support in constructing
an argumentative essay, from planning and revising, to explaining and rebutting. Thus, this
assessment addresses both potential language lacks and authentic language demands.
However, the authenticity of the vocabulary test is more tenuous. Brown and Hudson (1998,
p. 659) state that this test type lacks authenticity, in that real-life language is not multiple-
choice. Bothell (2001, p. 4) emphasizes that MC questions should be avoided when other
item types are more appropriate. Nonetheless, constructed-responses are the only test items
that can address avoidance strategies, whereby NNSs evade using challenging lexis (Brown
2000, p. 149). Although other constructed-response tests could likewise address the
undesirable strategy, the MC test was selected in that it is more reliable than a true-false test
(Brown & Hudson 1998, p. 659) and may have more construct validity than a matching test,
in that the latter can become more of a puzzle-solving process than a genuine test of
Washback
Both tests could potentially promote beneficial washback effects. Although the vocabulary
assessment may seem a bolt-on test to discriminate between those who have memorised
sufficient lexis and those who have not, its inclusion is based on sound theoretical principles.
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 12 of 52
Mitchell, Miles and Marsden (2013, p. 143) reiterate research that found that explicit
vocabulary learning may lead to proceduralisation of this knowledge. Thus, it is hoped that
vocabulary learning required for the first test, leads to fluent use in future writing tasks.
Beneficial washback is promoted by turning the traditional test into a feedback opportunity,
which aims to identify students strengths and weakness and provide suggestions for further
The portfolio assessment could promote beneficial washback by integrating similar tasks into
daily teaching and learning. Ideally, test content will be taught according to a genre-based
class perhaps with the use of the AWL - and sequentially apply and transform new
understandings during classroom writing activities. As students are required to relate their
compositions to own disciplines (Appendix C, p. 45- 47), it would be most effective if they
source own discipline-specific readings. Additionally, students could work on the portfolio in
class, whereby learning to write is scaffolded by peer reviews and teacher feedback.
Practicality
The two tests balance each other in regards to practicality. The vocabulary test could be re-
per cent of the test is scored objectively. Practicality could be further increased by
computerising the test, whereby current items form the basis for an item bank. By contrast,
potentially costly to administer, in that colleagues may require training, rating sessions may
need to be conducted (Brown & Hudson 1998, p. 662), and, sufficient time needs to be
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 13 of 52
allocated for student conferences (Brown & Abeywickrama 2012, p. 134). Consequently,
Conclusion
The two tests were carefully constructed by considering reliability, validity, authenticity,
washback and practicality. Reliability was improved by adding objectively-scored tasks and
developing a detailed analytical scoring rubric. Validity was addressed by designing the
Authenticity criteria were met by aligning the task with real-life language demands in
university settings. Washback was addressed with the suggestion of integrating assessment
tasks into daily learning activities. Finally, time and effort required for portfolio assessment
favourable conditions were created to foster the ideal relationship between tests as well as
References
Bachman, LF 1990, Measurement, Fundamental Considerations in Language Testing, Oxford University Press,
Bachman, LF & Palmer, A 1996, Describing, identifying, and defining: test purposes, tasks in the TLU domain,
Designing and Developing Useful Language Tests, Oxford University Press, Oxford, UK, pp. 95 132.
Bejarano, PAC, Chapetn, CM 2013, The Role of Genre-Based Activities in the Writing of Argumentative Essays
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 14 of 52
Bothell, TW 2001, 14 Rules for Writing Multiple-Choice Questions, 2001 Annual University Conference, Brigham
Brown, HD 2000, Principles of language learning and teaching, 5th edn, Pearson Longman, White Plains, NY.
Brown, HD & Abeywickrama, P 2010, Language Assessment: Principles and Classroom Practice, 2nd edn, Pearson
Brown, J 1998, Language testing: purposes, effects, options, and constraints, TESOLANZ Journal, vol. 6, pp. 13
30.
Brown, JD & Hudson, T 1998, The alternatives in language assessment, TESOL Quarterly, vol. 32, no. 4, pp. 653
675.
Clarke, IL & Hernandez, A 2011, Genre Awareness, Academic Argument, and Transferability, WAC Journal, vol.
Coxhead, A 2000, A New Academic Word List, TESOL Quarterly, vol. 34, no. 2, pp. 213 238.
Ellis, NC, Simpson-Vlach, R & Maynard, C 2008, Formulaic Language in Native and Second Language Speakers:
Psycholinguistics, Corpus Linguistics, and TESOL, TESOL Quarterly, vol. 42, no. 3, pp. 375 396.
Gebril, A 2009, Score generalizability of academic writing tasks: Does one test method fit it all? Language
Hamp-Lyons, L & Kroll, B 1996, Issues in ESL Writing Assessment: An Overview, College ESL, vol. 6, no. 1, pp. 52
72.
Hughes, A 2003, Testing for Language Teachers, 2nd edn, Cambridge University Press, Cambridge.
Hyland, K 2002, Directives: Argument and Engagement in Academic Writing, Applied Linguistics, vol. 23, no. 2,
Hyland, K 2008, Academic clusters: text patterning in published and postgraduate writing, International Journal
Hyland, K & Tse, P 2007, Is There an Academic Vocabulary?, TESOL Quarterly, vol. 41, no. 2, pp. 235 253.
Kroll, B & Reid, J 1994, Guidelines for Designing Writing Prompts: Clarifications, Caveats, and Cautions, Journal
Lee, SH 2008, An integrative framework for the analyses of argumentative/persuasive essays from an
interpersonal perspective, Text and Talk, vol. 28, no. 2, pp. 239 270.
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 15 of 52
Liu, F & Stapleton, P 2014, Counterargumentation and cultivation of critical thinking in argumentative writing:
Investigating wash-back from a high-stakes test, System, vol. 45, pp. 117 128.
Lynch, B 1997, In search of the ethical test, Language Testing, vol. 14, no. 3, pp. 315 327.
Mitchell, R, Myles, F & Marsden, E 2013, Second Language Learning Theories, 3rd edn, Routledge, London.
Perkins, K 1983, On the Use of Composition Scoring Techniques, Objective Measures, and Objective Tests to
Evaluate ESL Writing Ability, TESOL Quarterly, vol. 17, no. 4, pp. 651 671.
Swales, JM & Feak, CB 2012, Academic Writing for Graduate Students, University of Michigan Press.
about/diversity-equity/index.html.
Winsgate, U 2012, Argument! helping students understand what essay writing is about, Journal of English for
Bianca van de Water 5037712 EDGT983 Assessing & Evaluating Rationale Page 16 of 52