Ethics in Language Testing
Ethics in Language Testing
LANGUAG
E TESTING
Prepared by:
Shyne D. Canimo
MAEd-Language Teaching
INTRODUCTION
The accomplishment of academic achievement is affected by the
measurement of students’ performance and competencies in a certain
subject towards the end of a course or a program. The term ‘assessment’ is
generally used to cover all methods of testing and assessment although
some educators apply ‘testing’ to formal or standardized tests (Byram,
2004). It is to give a general impression of how the subjects learned impact
the students’ understanding of the lessons and identify the skills they are
strong in. However the scores do not necessarily reflect students’ actual
knowledge and skills in reality. In some cases, proficiency tests which are
used to determine students’ level of certain skills have become an important
aspect in measuring ability.
What is ethics in
language testing?
Alderson, Clapham and Wall (1995) assert that ethics in
language testing is merely an extended validity comprised of
validity and washback, a viewpoint that can be traced back to
Messick’s (1989) consequential validity or the social
consequences of test use. Nevertheless, Davies (2008)
cautions that this view of ethics may entail more social
consequences than what language testers can account for.
Hence, he suggests setting the reasonable limits for ethics in
language testing.
Lynch (1997) proposes a more elaborate view of ethics as involving
consent, deception, confidentiality and fairness. In other words, the extent
of ethicality is determined via the examination of aspects such as whether
test takers are clearly advised of test content, objectives, results and uses
and whether face validity, concern for public humiliation, bias in test
content, and consequences of language testing decisions are taken into
consideration.
To sum up, despite considerable differences in the view of ethics,
there is general consensus that ethics in language testing is a set of agreed
rules (Davies et al, 1999) that govern the conduct of professional language
testers, guiding them to make the decisions that both are right in and of
themselves and lead to the best possible results at every stage of the testing
procedure.
Why should
ethics receive
priority
consideration?
Thrasher (2004) reaffirms the need to address ethics
owing to the unfair treatment against test-takers, and
envisions the resultant betterment of testing services. Two
other reasons cited by Davies (2004) include the remarkable
impact of tests on people’s lives, and the vulnerability of
language tests to ethical issues due to the dual nature of
language as both knowledge and skill.
All the aforementioned sources assert that ethics
deserves a core position in all language testing contexts
besides psychometric and linguistic factors. The next part
will further elaborate on how this can be translated into
practice according to the relevant literature.
How can ethics be
implemented in
language testing?
Regarding the practice of ethics in language testing,
McNamara and Roever (2006) mentions three traditional methods,
namely the psychometric techniques to detect biases, the systematic
inspection of suitability of test content via a process called fairness
review, and the prescription of appropriate behaviour via codes of
ethics. Besides these, a more recent approach named critical
language testing by Shohamy (2001) will also be reviewed.
Psychometric
detection of
bias
Test bias occurs when test items work differentially for particular
sub-groups of test takers, and it is a frequent concern in testing
(McNamara, 1998). Various sophisticated psychometric techniques
have been developed to identify and lessen the effect of this confound
on test scores. These are divided into four groups in McNamara and
Roever (2006), including analyses based on item difficulty,
nonparametric approaches, item response theory based approaches,
and others such as logistic regression and multifaceted measurement.
However, Elder (1997) has pointed out that bias detection via
psychometric techniques is not ethically neutral as claimed
because it ultimately relies on human decisions about test construct
and inferences. This problem was documented in the same paper
on LOTE (Languages Other Than English) school examinations,
which also revealed that adjustments made on the bias analyses
might even generate further unfairness. In other words, it can be
said that though psychometric approach still plays a pivotal role in
ethical language testing, it alone is far from sufficient and other
solutions must be sought to complement it.
Fairness
review
Besides the psychometric approach, major language testing organizations
also adopt the fairness review as part of their commitment to fair testing
(McNamara and Roever, 2006). At the University of Cambridge ESOL
examinations, the test writing guidelines prescribe the avoidance of potentially
offensive, too specialized or technical items. Once written, the items are examined
for appropriateness of content and other problems during the pre editing stage.
A similar and even stricter process is in place at the Educational Testing
Service (ETS), which publishes its own Fairness Review Guidelines. Under this
framework, all testing staff is required to undergo fairness review training, and all
test items are screened by well-trained ethics experts whose job is to remove those
involving age, disability, ethnicity, gender, national origin, race, religion, and
sexual orientation.
However, there are two major problems with such fairness review
(McNamara & Roever, 2006). First, there are remarkable differences among
cultures regarding the potential offensiveness of each topic or item,
resulting in the need for complicated country-specific fairness guidelines in
the case of international tests like IELTS and TOEFL. Second, some
explicitly avoided materials may actually be construct-relevant, so their
omission may lead to an underrepresentation of test constructs.
Overall, the practice of fairness review is highly laudable in spite of
these problems as it represents the serious consideration of international test
developers for ethical testing. Besides, it may act as the model for fairness
review of national language tests where there are still disadvantaged groups
of candidates, yet the cultural homogeneity may ease the identification of
offensive topics and items.
Code
of
Ethics
According to Davies (2008), to compensate for the lack of sanctions against members
for unethical conduct as often seen in strong professions like law and medicine, language
testing associations can offer an ‘ethical milieu’ in form of codes of ethics and/ or practice.
Such professional codes have recently been developed by major language testing associations
such as the International Language Testing Association (ILTA), the Association of Language
Testers in Europe (ALTE), and the European Association for Language Testing and
Assessment (ELTA). Due to the scope of this essay, only the Code of Ethics of ILTA, the
global forum for language testers, will be discussed.
The ILTA Code of Ethics consists of 9 principles further specified by annotations,
stipulating the appropriate conduct of ILTA members as well as the potential challenges and
limitations in implementation of the principles. McNamara & Roever (2006) highlights the
role of this code as a moral framework for testers’ work, yet critiques its vagueness and
paucity of mechanisms against violations. Davies (2011), as the leading author of the code, is
well aware of this problem and concedes that a code has no legal bearing, hence cannot
protect a profession from misuse.
Principle
1
Language testers shall have respect for the humanity
and dignity of each of their test takers. They shall
provide them with the best possible professional
consideration and shall respect all persons’ needs,
values and cultures in the provision of their
language testing service.
Principle
2
Language testers shall hold all information obtained
in their professional capacity about their test
takers in confidence and they shall use
professional judgement in sharing such
information.
Principle
3
Language testers should adhere to all relevant ethical
principles embodied in national and international
guidelines when undertaking any trial,
experiment, treatment or other research activity.
Principle
4
Language testers shall not allow the misuse of their
professional knowledge or skills, in so far as they
are able.
Principle
5
Language testers shall continue to develop their
professional knowledge, sharing this knowledge
with colleagues and other language professionals.
Principle
6
Language testers shall share the responsibility of
upholding the integrity of the language testing
profession.
Principle
7
Language testers in their societal roles shall strive to
improve the quality of language testing,
assessment and teaching services, promote the
just allocation of those services and contribute to
the education of society regarding language
learning and language proficiency.
Principle
8
Language testers shall be mindful of their obligations
to the society within which they work, while
recognizing that those obligations may on
occasion conflict with their responsibilities to
their test takers and to other stakeholders.
Principle
9
Language testers shall regularly consider the
potential effects, both short and long term on all
stakeholders of their projects, reserving the right
to withhold their professional services on the
grounds of conscience.
Critical
Language
Testing
Concerns about the enormous power of tests and the unethical language
testing practice has led Shohamy (2001) to coin the term critical language testing
in which tests are embedded in social and political contexts, placing the field
within the domain of critical pedagogy. Its foremost aim is to “minimize, limit,
control the powerful uses of tests” (Shohamy, 2001, p.131), encouraging
stakeholders to question test uses, test materials as well as values and beliefs
represented by tests.
Critical testing can be regarded as an innovative development of language
testing in recent time as it engages the wider sphere of social and political
dialogue, adding the social dimension to the traditional bases of psychometrics
and linguistics. It should be noted that
the critical language testing movement is not aimed at the abolishment of tests,
but the democratisation of the testing procedure by involving all stakeholders
(especially test takers) and using multiple assessment methods.
Nonetheless, few studies in language testing have adopted this critical
approach probably because it delegates tremendous responsibility to the
practitioners (Karami, 2013). Shohamy (2007) urges stakeholders especially
teachers to critique the uses and consequences of tests in order to alleviate their
enormous and harmful influence, yet its feasibility depends to a large extent on
the power granted to them. Unfortunately, it is in the countries where teachers are
the most powerless that language tests are most often misused, so a critical stance
is highly needed but rarely taken (Fulcher, 2009).
Overall, the foremost contribution of the critical language testing approach
is to shed light on the potentially catastrophic consequences of test uses and
heightens the need to adopt a variety of approaches to mitigate those. It remains
to be seen if this approach will ever gain the prominence like its traditional
counterpart of psychometrics.
The ethical concerns that are discussed in language testing are essential.
Decisions made about a person on the basis of a test score can have serious and
far-reaching consequences. Ethics in language testing is very important. Spolsky
(1977) supported the approach to language testing that requires full justification
of all statements based on tests. He pointed out that language testers must be as
concerned with the prevention of bad testing as with developing new tests, and
that they must be sensitive to the possible educational, social and political
consequences of testing. As tests have impact on the lives of test takers, any
decision should be done professionally. This information would have influenced
their approach to language learning and reduced the negative impact on their
lives.
On the other hand, item-writers tried to soften the possible
consequences and worked consciously in test design and item
writing to maximize the possibility of positive washback. This
reflected in the following year when the achievements of the
students were better and the score grouped distribution was more
consistent. The sensitive approach of the administrators, test
designers and item writers confirmed the well-known statement
that practices must be just and tests must be primarily just and fair
for all.
THANK
YOU
FOR
LISTENING!