Lg Testing
Lg Testing
Language Testing
Background
Language testing and assessment build on • a test uses tools, techniques or a method
theories and definitions provided by linguistics to measure lg proficiency regardless of
applied linguistics, language acquisition, and lg any courses candidates may have
teaching, as well as on the disciplines of
followed
testing, measurement, and evaluation. These
disciples are used to construct valid lg tools • it is used at a specific point in time
assessing the quality of lg. The field of testing • it is a fundamental part of learning and
has 2 major components: teaching
• often involves collecting data in a
• What? – materials need to be assessed numerical form
(trait) • its aim is to discover how far students
• How? – the specific procedures and have achieved the objectives of a course
strategies used for assessing the what of study
(method) • its aim is to diagnose Ss’s strengths and
weaknesses
• its aim is to assist the placement of Ss by
Five periods in the development of the field: identifying the stage of part of the
● I. pre-scientific period: focus on teaching program most appropriate to
translation and structural accuracy their ability
(grammar-translation method)
2. Practicality 3. Discrimination
All assessment is based on comparison, either
A practical test is a test that is developed and
between one student and other or between
administered within the available time and
students as they are now and as they were
with available resources (the time is enough,
earlier. An important feature of a good test is
and the resources are easy to reach).
its capacity to discriminate among the
Practicality refers to the economy of time,
performance of different students or the same
effort, and money. It is easy to mark, design,
student at different points in time. The extent
administer and interpret.
of the need for discrimination varies according
3. Reliability to the purpose of the test.
Types of assessment
something or to give a judgment about
something. Evaluation refers to the systematic
gathering of information for the purpose of • 1. placement test: Help educators place a
making decisions (Weiss 1972) student into a particular level or section of
a language curriculum or school. Various
In the educational context, the verb ‘to types or testing procedures can be used for
evaluate’ often collocates with terms such as purposes, e.g. dictation, interviews,
the effectiveness of the educational system, a grammar tests
program, a course, instruction, a curriculum. • 2. diagnostic test: Help teachers and
learners to identify strengths and
2. assessment weaknesses. These tests are mostly
Assessment is the process of collecting conducted for research purposes to get
information about students from diverse evidences of theories of language learning.
sources so that educators can form an idea of It has intrinsic pedagogical value.
what they know and can do with this • 3. proficiency test: Measure learners’ level
knowledge. It is concerned with the student’s of knowledge. Not tied to any specific
performance. Assessment is the process of textbook-based syllabus. designed to get
collecting information about learners using samples of use of the target language at
different methods or tools (e.g. tests, quizzes, different levels of abilities. proficiency can
portfolios, etc). We can assess needs, academic also be defined as several components like
readiness, progress, and skill acquisition. the four skills – listening, speaking,
reading and writing, and two elements –
The verb ‘assess’ often collocates with skills, grammar and
abilities, performance, aptitude, competence.
• vocabulary, and tests are designed to taken the test previously, it is a
assess knowledge on each sub-component. standardized test. The mode of measuring
• 4. achievement test: Are intended to learner’s capabilities against a fixed norm
measure the skills and knowledge learned is often called norm-referenced testing.
after some kind of instruction (end-of- Examples of standardized tests are IELTS
year test). Also referred to as attainment and TOEFL.
or summative tests. proficiency can also • The test does not give any useful
be defined as several components like the information to the candidate about the
four skills – listening, speaking, reading quality of their performance or feedback to
and writing, and two elements – grammar improve. Hence, this test type does not
and vocabulary, and tests are designed to have much value for the classroom teacher
assess knowledge on each sub-component. and need not be used in the classroom
• 5. aptitude test: Designed to assess what context.
is person is capable of doing or to predict • When tests measure learner capabilities
what a person is able to learn or do given according to the achievement of course
the right education and instruction. objectives, differing levels of performances
Administered a priori to learning a are linked to specific grades or descriptions
foreiglanguage to understand if a leaner of 'can-do' abilities - criterion-referenced
has the ability. Does not measure how well tests.
someone uses a specific language, but how • It is carefully designed according to
well they acquire language skills in descriptions of differing levels.
general. You might use this type of test • Feedback to learners and language learning
when selecting candidates for a role that promotion Hence, these may be used for
would require them to learn a new classroom-based assessments.
language. Typically aptitude tests include
Proficiency
• case study analysis
• oral presentations
In language assessments it is not always the • artistic performances
case that the performance has to be assigned • grades
a numerical value such that it becomes a raw
score. The performance can also be reported in
a descriptive manner to capture one’s level of
language proficiency, which is seen as a
B) Indirect measures
What can teaching affect?
• course evaluation • students’ self-concept, motivation,
• test blueprints (outlines of the concepts anxiety, learning strategies and their
and skills covered on tests) opportunities in advancement
• percent of class time • parents’ view on their children’s learning
• number of student hours on service process
learning • teaching content, methods and evaluation
• number of hours spent on homework • curriculum and teaching methods in
• number of student hours spent on schools
intellectual or cultural activities • allocation of opportunities, concept of
• grades not based on explicit goals knowledge and skills in society
Item types
the washback effect the test has on teaching
and argue that a “test’s validity should be
measured by the degree to which it has had a
beneficial effect on teaching”.
There are several item types to choose from
and the choice should be made carefully
because item types affect performance on the
Washback is often seen as negative because it
test. If the candidate is not familiar with the
makes teachers teach for the tests and ignore
item type or it is not clear what is required
skills and knowledge areas not relevant for the
from them, the elicited performance will not
exam. Washback can also be regarded as
reflect their ability.
positive if it brings about innovation in the
curriculum.
The washback effect on language It seems that the first change that might occur
tests
as a result of language tests is change in the
content of teaching. It is much more difficult
Tests can have an impact on teachers, learners, for teachers to change their behavior and
the system of education, text book writers, attitudes.
and administrators as well as on the processes
of learning and teaching (Wall, 2000)
There are many factors that influence the
The effect is called backwash in general
washback effect:
education, but in testing literature, we prefer
the term washback.
• teacher ability
• teacher’s understanding of the test
• teaching style
• teachers’ background
• teachers’ attitudes towards • can write in an appropriate manner for a
innovation particular purpose and for a particular
• classroom conditions audience
• resources
• school management practices
• status of the subject in the curriculum
• general social and political context “Writing is the ability to develop thoughts in
writing and to convey them accurately,
• the role of publishers in material
effectively and appropriately with a particular
design audience and communicative purpose in mind.”
• the time elapsed since the
introduction of the exam
Construct
Practical considerations concerning the
There are three influential approaches to People read for various purposes, and the
describing the reading process: purpose of reading determines the way we
read. In real life we use a variety of skills and
1. the bottom-up approach: moving from small subskills, which we select according to the
parts and building up a whole (c.f. putting reading purpose and the text type. Grabe (1991)
together a jigsaw puzzle when you do not have lists 6 skills and knowledge areas that, readers
a copy of the finished puzzle). Typically activate:
associated with behaviorism in the 1940s-
1950s. ln this approach the readers are "passive • automatic recognition skills
decoders of sequential graphic-phonemic- • vocabulary and structural knowledge
syntactic-semantic systems, in that order" • formal discourse structure knowledge
(Alderson 2000. p. 17). First, the graphic stimuli • content/world background knowledge
are recognized, and then they are decoded into • synthesis and evaluation skills/strategies
sounds. Through the sounds, the reader can • metacognitive knowledge and skills
recognize words and decode meanings. These monitoring (i.e. skimming, scanning,
sub-processes that the reader involves build on previewing, formulating questions about
one another unidirectionally, i.e. a later information, monitoring cognition,
component does not feedback into a preceding recognizing problems)
sub-process.
Operations
•
Can-do statements
• short answer questions: candidates hear a
sentence or a short passage and re-
respond in no more than a few words to ln order to design a valid test or speaking
questions based on the text. If more ex- abilities, We must define what we expect the
tended answers are needed, it is advisable candidate to be able to perform in what kind
to use the candidates' Ll both in the of rea| life situations.
questions and in the answers. • The speaking test should then consist of
• Multiple choice items: Candidates hear a tasks that form a representative sample of
sentence or a short passage and respond to all the situations in which the candidate is
multiple-choice items based on the expected to function successfully.
information included in it. Longer texts • The tasks should elicit linguistic behavior
must be broken down into shorter chunks that truly represents the candidate's
to make sure comprehension and not ability.
memory is tested. • It should be possible to score performance
• Note-taking: This technique simulates validly and reliably (Hughes, 2003)
real-life situations in which students • The self-assessment scale of the common
listen to lectures and take notes. When reference levels in the CEF describes the
recording such a talk, the lecturer should speaking abilities of successful 82
be asked to talk from notes and not to candidates as follows:
read out a written script. A disadvantage • Spoken interaction: I can interact with a
of this task is that the standardized degree of fluency and spontaneity that
marking of the notes taken by the makes regular interaction with native
examinees can be difficult. speakers quite possible. I can take an
• Matching and sequencing; Candidates active part in discussion in familiar
match the words, phrases, or sentences contexts, accounting for and sustaining
they hear to pictures, match passages to my news.
headlines, or sequence the events of a • Spoken production: I can present clean,
story. Especially suitable for testing sound detailed descriptions on a wide range of
discrimination: e.g. ship-sheep, pen-pan, subjects related to my field of interest. l
tree-three, etc. can explain viewpoint, on a topical issue
• Information transfer: This technique is going the advantages and disadvantages of
suitable for testing the understanding of various options.
transactional language. Based on [the
information in the text, the candidates fill
Texts
in a chart or complete a table. Pauses in
the text are necessary to give time for the
candidates to do the Writing. There are various text, types and operations
• Gap frIling: This technique requires that, language learners can be expected to
intensive listening. Attention must be produce orally:
paid to ensuring that gaps (also called • Monologues: prepared or unprepared -
blanks) should not fall too close to each
rehearsed or spontaneous. e.g,:
other in the text and should concern
introductions, presentations, narratives,
content rather than structural elements
of the language. descriptions, instructions.
• Conversations and dialogues: with peer
or expert speaker (teacher, examiner).
• Discussions: bridging information or
opinion gaps, problem solving, exploring a
Testing oral skills topic.
• Interviews: with examiner or peer
Construct • Roleplays and simulations
Rules of thumb
which the examiner follows consistently.
• Description task: This can be an individual
or a paired task and can involve one or
more pictures. In the paired format the
examinees can get one picture each, which • Give candidates as many fresh starts as
they describe to one another to discover possible. (ln other words, the various tasks
similarities and differences. Another included in the speaking paper should be
possibility is to give one picture to one of independent of each other. This will lead to
the candidates and three or more very greater validity and reliability as well.)
similar pictures which differ only in minor • Set only tasks that would not be difficult
details from the second. On the basis of the for the test takers in their mother tongue.
description given by the first candidate, • Write and follow rubrics. Don't improvise.
the second selects which of the similar • Use a separate interlocutor and assessor
pictures has been described. (the tester should not be seen making
• Compare and contrast: This can also be notes).
either a paired or an individual task. The • Standardize interlocutor behaviour and
candidate can compare and contrast two provide training in it (use videotaped
pictures that are related to the same topic, exams).
e.g. life in the city vs. life in the country or • Remember that individual oral tests can be
people using different means of transport, particularly stressful.
etc. This task is similar to the description • Ensure reliable scoring: create a scale for
but involves some analysis as well, which scoring (holistic vs. analytic scales) and
requires the use of more complex train raters.
language.
• Instruction task: Ideally this task is
performed in pairs, One of the candidates
Scoring
c. course-books
d. schools
The scoring of a speaking test requires careful e. society?
planning in order to make it both valid and
3. What conditions does the effect of tests
reliable. It will be valid if it fits the construct
depend on?
and the aims of the test as defined in the test
specifications, and reliable if it ensures Tasks
consistent use. Rating scales are meant to help
the examiner to tell good performances from
poor ones. Both holistic and analytic scales are 1. Think back to one of the language
in use. proficiency exams you took during your course
of studies. How did it influence what and how
Rater training for speaking exams you were learning while you were preparing for
the exam? What do you think your teacher did
that he/she would otherwise not do while
Contrary to common belief, both holistic and preparing you for the exam?
analytic scales require that the examiners get
thorough training in using them. No matter 2. Choose one of the components of the new
how detailed and how clearly phrased a scale Hungarian school-leaving exam in English.
is, individual examiners will interpret it, Read the test specifications and the advice
differently unless they take part in examiner provided for teachers preparing their students
training. for the
exam(https://www.oktatas.hu/pub_bin/dload/
kozoktatas/erettsegi/vizsgakovetelmenyek201
Exercises I.
2 How can you ensure that a writing task only
tests writing ability and nothing else?
3 How can feedback on students’ writing lead
Washback effects to improvement?
4 What similarities and differences do you see
Questions between analytic and holistic scoring of
written performance?
5 How can a teacher match testing to teaching
1. What can tests have an impact on? if in teaching the focus should be on the
2. In what ways can language testing affect process of writing, whereas testing always
a. students focuses on the 'product'.
b. teachers
Reading skills
Tasks
Tasks
2. Study the following list of criteria for
evaluating student writing. Rank order them l Create a graphic organizer to demonstrate
from very important to less important: your understanding of reading approaches.
- suitable title 2 Select a short article from the International
- Appropriate length Herald Tribune or any other easily accessible
- Neat and tidy handwriting English-language daily paper. Using a single
- Correct spelling text, prepare three reading comprehension
- Correct punctuation tasks, one for pre-intermediate, one for
- Clear communicative purpose - Original ideas intermediate, and one for advanced students.
- coherent text 3 A student complains about failing the
- well-structured sentences reading comprehension part of an exam,
- Appropriate use of paragraphs - good saying that it, did not only need knowledge of
cohesion of text English but also some familiarity with history
- Effective introduction and geography. Discuss if his or her complaint
- Effective conclusion can be justified?
- Correct grammar
- Wide range of vocabulary
- Task-appropriate style Oral skills
Questions
Listening skills
Dimensions in vocabulary testing
Questions
1 Why do many language learners think that 1. Discrete vs. embedded testing: first: separate
listening comprehension tasks are easier than component (e.g. multiple choice testing);
reading comprehension? Is this view embedded: integrated in other areas (vocab in
justifiable? an essay)
2 Should test takers be given some time to 2. Selective vs comprehensive: selective: students
read the tasks before they listen to the
are tested on a specific number of items;
recorded text?
comprehensive: overall use of vocab (speaking
3 Can texts for listening comprehension be
task)
written from imagination?
3. Context-independent vs. context-dependent:
first: out of context testing; second: contrary
Tasks
Testing Vocabulary and Grammar • What is the role of context? Words can
have multiple meanings and can have
With the advent of communicative language different grammatical roles. If we include
testing, vocabulary and grammar are usually context, we also include other skills such
assessed in integrated tests. Vocabulary and as reading in a cloze test. Today there are
grammar are frequently evaluated as a less separate tests.
component of writing and speaking abilities,
but regarding and listening tests can also
include items measuring candidates’ lexical Tasks used for assessing vocabulary
and grammatical knowledge. Lexical and • multiple choice: best alternative;
grammatical competence are assumed to be ORIGO/Cambridge Proficiency
inseparable by number of researchers, which • finding synonyms (TOEFL)
assumption is also reflected in the current • identify words based on the definition
practice that these aspects of lg proficiency • providing definitions
are often tested in a combined test such as the • matching
UOE components. • completion items/fill-in
• sentence writing/example sentence
Tasks