0% found this document useful (0 votes)
3 views

Lg Testing

The document discusses the principles and practices of language testing and assessment, highlighting its importance in measuring language proficiency and supporting learning. It outlines the evolution of language testing, key components of effective tests, and the relationship between teaching and testing, emphasizing the need for validity, reliability, and practicality. Additionally, it categorizes different types of assessments and testing methods, providing insights into their purposes and implications for language education.

Uploaded by

Vivien Dobozi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lg Testing

The document discusses the principles and practices of language testing and assessment, highlighting its importance in measuring language proficiency and supporting learning. It outlines the evolution of language testing, key components of effective tests, and the relationship between teaching and testing, emphasizing the need for validity, reliability, and practicality. Additionally, it categorizes different types of assessments and testing methods, providing insights into their purposes and implications for language education.

Uploaded by

Vivien Dobozi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2022/23/II.

Language Testing

Background
Language testing and assessment build on • a test uses tools, techniques or a method
theories and definitions provided by linguistics to measure lg proficiency regardless of
applied linguistics, language acquisition, and lg any courses candidates may have
teaching, as well as on the disciplines of
followed
testing, measurement, and evaluation. These
disciples are used to construct valid lg tools • it is used at a specific point in time
assessing the quality of lg. The field of testing • it is a fundamental part of learning and
has 2 major components: teaching
• often involves collecting data in a
• What? – materials need to be assessed numerical form
(trait) • its aim is to discover how far students
• How? – the specific procedures and have achieved the objectives of a course
strategies used for assessing the what of study
(method) • its aim is to diagnose Ss’s strengths and
weaknesses
• its aim is to assist the placement of Ss by
Five periods in the development of the field: identifying the stage of part of the
● I. pre-scientific period: focus on teaching program most appropriate to
translation and structural accuracy their ability
(grammar-translation method)

What makes a good language test?


● II. discrete-point testing: lg as a unit of
lexical and structural items (assessing
isolated elements) • results are standardized and reliable,
● III. integrative era: integrated and which means that it is easy to compare
discoursal lg (audio-lingual method) with another candidates’ work
● IV. communicative era: aimed to replicate • assesses with a high degree of
interactions (performing tasks taken from independence and objectivity
real life contexts) • large numbers within a short period of
● V. the era of uncertainty – questions of time
the meaning of lg and possibilities of • test validity – valid test
measuring it in diverse, multilingual • positive effect on student
contexts • measures without “traps”
• enables students to show ability
• provides feedback for the teacher

Teaching and testing


• in the past, the two notions were Basic principles/Tips
separated
• test what you teach and test regularly
• a test should reinforce learning/motivate
• give feedback and make it part of the
students to learn
learning process (it is not only for grading)
• relationship between teaching and testing
• make it reflect the students’ level
– washback effect avoid punishing with it and never use it as a
• influence of standardized tests and public tool of teaching
examinations
• skill-based teaching – testing Keywords in testing
(productive-written/oral, receptive –
1. Reliable
written/oral + vocabulary/use of English)
2. Valid
3. Practical
What is testing? What is its purpose? 4. Discriminate
5. Authentic
• an attempt to measure a person’s 6. Analyzable
knowledge in a systematic way
1. Validity
• stability: similar scores are obtained with
repeated testing with the same group
It refers to whether or not the test measures • internal consistency (homogeneity):
what it claims to measure. On a test with high concerns the extent to which items on
validity, the items will be closely linked to the the test are measuring the same thing.
test’s intended focus. Content validity means
that the test assesses the course content and
the outcomes using formats familiar to the
students.

Factors affecting validity:


1. Factors in the test:
Factors affecting reliability:
• unclear directions, ambiguous statements
• the difficulty with vocabulary and syntax
1. The length of the test: longer tests produce
• too easy or difficult items
more reliable results than very brief quizzes. In
• inappropriate items
general, the more items on a test, the more
• inadequate time
reliable it is considered to be.
• short length
• items not arranged in order of difficulty.
2. The administration of the test: include the
2. Factors in test administration and scoring:
classroom setting (lighting, seating
• unfair aid
arrangements, acoustics, lack of intrusive noise
• cheating
etc.) and how the teacher manages the test
• unreliable scoring of essays
administration.
• insufficient time
• adverse physical and psychological condition 3. Affective status of students: test anxiety
3. Factors related to students:
• test anxiety
• physical and psychological state

2. Practicality 3. Discrimination
All assessment is based on comparison, either
A practical test is a test that is developed and
between one student and other or between
administered within the available time and
students as they are now and as they were
with available resources (the time is enough,
earlier. An important feature of a good test is
and the resources are easy to reach).
its capacity to discriminate among the
Practicality refers to the economy of time,
performance of different students or the same
effort, and money. It is easy to mark, design,
student at different points in time. The extent
administer and interpret.
of the need for discrimination varies according
3. Reliability to the purpose of the test.

It means consistency, dependence, and trust.


The results of a reliable test should be 4. Authenticity
dependable. They should remain stable, and
Authenticity means that the language
consistent, and not be different when the test
response that students give in the test is
is used on different days. A reliable test yields
appropriate to the language of
similar results with a similar group of students
communication. The test items should be
who took the same test under identical
related to the usage of the target language.
conditions.
“A language test is said to be authentic when
it mirrors as exactly as possible the content
3 types of reliability:
and skills under test.” = Authentic tests are an
attempt to duplicate as closely as possible the
circumstances of real-life situations.
• equivalence: refers to the amount of
agreement between two or more tests
that are administered at nearly the same
point in time (the parallel forms the texts
are given to the same group – test)
5. Analysis
Washback: the impact of a test on teaching Types of assessment
and learning. 1. formative assessment
• process-oriented, also referred to as
“assessment for Learning”
„Such impact is usually seen as being negative: • an ongoing process to monitor learning
tests are said to force teachers to do things • its aim is to provide feedback to improve
they do not necessarily wish to do. However,
teachers’ instruction methods and improve
some have argued that tests are potentially
students’ learning
also levers for change ́ in language
2. summative assessment:
education: the argument being that if a bad
• product-oriented, also referred to as
test has a negative impact, a good test should
or could have positive washback” (Alderson &
“Assessment of Learning”
Banerjee, 2001, p.214). • used to measure student learning progress
and achievement at the end of a specific
instructional period
3. alternative assessment:
• it is also referred to as authentic or
performance assessment
The definition of a test • an alternative to traditional assessment
that relies only on standardized tests and
exams
A measurement instrument designed to elicit • requires students to do tasks such as
a specific sample of individual behaviour (gives presentations, case studies, portfolios,
evidence of the abilities which are of interest). simulations, reports, etc.
• instead of measuring what students know,
alternative assessment focuses on what
students can do with this knowledge

Testing, assessment and evaluation


terms used interchangeably by teachers 3. testing

1. evaluation It is a form of assessment


The verb evaluate means to form an idea of

Types of assessment
something or to give a judgment about
something. Evaluation refers to the systematic
gathering of information for the purpose of • 1. placement test: Help educators place a
making decisions (Weiss 1972) student into a particular level or section of
a language curriculum or school. Various
In the educational context, the verb ‘to types or testing procedures can be used for
evaluate’ often collocates with terms such as purposes, e.g. dictation, interviews,
the effectiveness of the educational system, a grammar tests
program, a course, instruction, a curriculum. • 2. diagnostic test: Help teachers and
learners to identify strengths and
2. assessment weaknesses. These tests are mostly
Assessment is the process of collecting conducted for research purposes to get
information about students from diverse evidences of theories of language learning.
sources so that educators can form an idea of It has intrinsic pedagogical value.
what they know and can do with this • 3. proficiency test: Measure learners’ level
knowledge. It is concerned with the student’s of knowledge. Not tied to any specific
performance. Assessment is the process of textbook-based syllabus. designed to get
collecting information about learners using samples of use of the target language at
different methods or tools (e.g. tests, quizzes, different levels of abilities. proficiency can
portfolios, etc). We can assess needs, academic also be defined as several components like
readiness, progress, and skill acquisition. the four skills – listening, speaking,
reading and writing, and two elements –
The verb ‘assess’ often collocates with skills, grammar and
abilities, performance, aptitude, competence.
• vocabulary, and tests are designed to taken the test previously, it is a
assess knowledge on each sub-component. standardized test. The mode of measuring
• 4. achievement test: Are intended to learner’s capabilities against a fixed norm
measure the skills and knowledge learned is often called norm-referenced testing.
after some kind of instruction (end-of- Examples of standardized tests are IELTS
year test). Also referred to as attainment and TOEFL.
or summative tests. proficiency can also • The test does not give any useful
be defined as several components like the information to the candidate about the
four skills – listening, speaking, reading quality of their performance or feedback to
and writing, and two elements – grammar improve. Hence, this test type does not
and vocabulary, and tests are designed to have much value for the classroom teacher
assess knowledge on each sub-component. and need not be used in the classroom
• 5. aptitude test: Designed to assess what context.
is person is capable of doing or to predict • When tests measure learner capabilities
what a person is able to learn or do given according to the achievement of course
the right education and instruction. objectives, differing levels of performances
Administered a priori to learning a are linked to specific grades or descriptions
foreiglanguage to understand if a leaner of 'can-do' abilities - criterion-referenced
has the ability. Does not measure how well tests.
someone uses a specific language, but how • It is carefully designed according to
well they acquire language skills in descriptions of differing levels.
general. You might use this type of test • Feedback to learners and language learning
when selecting candidates for a role that promotion Hence, these may be used for
would require them to learn a new classroom-based assessments.
language. Typically aptitude tests include

Summative Assessment and Formative


items on ability to distinguish between
pronunciation of different phoneme
strings and spelling of words with • When tests and assessments are
recognition of valid letter strings in the conducted to get estimates of learning
target language. are no longer popular acquired at different points of the course
because of their structural nature: it is are formative assessments.
considered no longer fair to predict success • The formative assessments help teachers
or failure of learning a language based on to plan for teaching while results from
just one performance that rewards an summative assessment help in taking
absolute score. decisions like promotion to a higher grade.
• There is a misconception that summative
• oral written
assessments are mostly tests that are
• objective subjective scored to get estimates of learners' overall
• classroom external learning in a course or an academic year
while formative assessments may take
• recognition production
other forms such as assignments,
observations and so on and are mostly
graded. However, we need to understand
that apart from major decisions like school
leaving certification exams, both
summative and formative assessments
can be tests, quizzes, assignments and
each sub-type can serve the purpose of
Standardized versus Classroom Based estimating overall achievement as
required in summative assessments or can
Tests give opportunities to learn by giving
notions of achievements during a course.
• The score on a language test reflects the
language ability of the learner with
1. formal mode – scores valuing each item
reference to a group performance or to a
of a paper (quizzes, essays, projects,
criterion level.
reports, diaries, portfolios)
• When a test measures a learner’s ability
2. informal mode – observation
according to a standard obtained as the
the middle score by a population who has
dynamic language behavior and not an
Measurements absolute trait. The description can be
systematically represented as bands or scales
A formal way to make estimates is to give
or levels. Each scale represents a specific
weight to the performances by using raw
quality of that behavior and a test is designed
scores and letter grades or scales. These are
and is scaled up or down to match with levels
different units of measurement.
of learner abilities at the point of testing.
Language ability may range from knowledge of
1. raw source: Score is a numerical index. When
a few words to being highly proficient and
scores are not yet transformed statistically, it
fluent in the language (e.g. CEFR levels).
is called raw score. Raw scores can be used
for further statistical analysis like calculating
the mean score of performance of a group.
2. scales: When measurements of learner
Methods of testing
performances are put on a scale, they are no
longer discrete points of measurement. The Tests can be direct or indirect depending on
performances become linked with one score in the way they elicit language samples. Indirect
relation to the others. The scores may be way of testing involves measurement of
represented as different points on a scale or a isolated areas of the ability in question. e.g.
continuum. There are four types of scales that making inferences on writing ability on the
are used in standard forms of measurements. basis of a test on punctuation. It assesses
knowledge without authentic application.
Direct testing measures the language ability in
question as it appears in real life situation. The
primary goal is to be as much like real life as
SCALE DESCRIPTION possible.
When mutually exclusive properties are
NOMINAL assigned a numerical value, it is a
nominal scale. It represents the According to the focus of the test:
frequency of occurrence of a property
rather than how much of that property is
present. For example, frequency counts • 1. discrete-point/analytical tests: focus
of learners’ L1 backgrounds of levels of on separate elements of ability and intend
language abilities. to test them in isolation
When individuals are presented on the • 2. integrated test: focus on the
ORDINAL basis of their performance set against performance as consisting of several
each other. This is not a discrete scale components
like the nominal but interconnected. For
instance, rank order indicates how much
of a property one learner has in relation
to others.
An index of performance like a tet score

Direct measures versus indirect measures


INTERVAL or someone’s height when presented on a
scale that is equally divided into
intervals, when intervals are drawn at
equal differences of scores.
A) Direct measures
It is a scale that has equal intervals and
• course and homework assignments
RATIO starts at zero. This scale can be used to
show group performance but not • exams and quizzes
individual performance as lg ability is • standardized tests
typically constructed as a notion of • term papers and reports
behavior and not an absolute measure as • observation or field work, internship
represented through a numerical value. • research projects
• class discussion participation

Proficiency
• case study analysis
• oral presentations
In language assessments it is not always the • artistic performances
case that the performance has to be assigned • grades
a numerical value such that it becomes a raw
score. The performance can also be reported in
a descriptive manner to capture one’s level of
language proficiency, which is seen as a
B) Indirect measures
What can teaching affect?
• course evaluation • students’ self-concept, motivation,
• test blueprints (outlines of the concepts anxiety, learning strategies and their
and skills covered on tests) opportunities in advancement
• percent of class time • parents’ view on their children’s learning
• number of student hours on service process
learning • teaching content, methods and evaluation
• number of hours spent on homework • curriculum and teaching methods in
• number of student hours spent on schools
intellectual or cultural activities • allocation of opportunities, concept of
• grades not based on explicit goals knowledge and skills in society

Some researchers view validity in the light of

Item types
the washback effect the test has on teaching
and argue that a “test’s validity should be
measured by the degree to which it has had a
beneficial effect on teaching”.
There are several item types to choose from
and the choice should be made carefully
because item types affect performance on the
Washback is often seen as negative because it
test. If the candidate is not familiar with the
makes teachers teach for the tests and ignore
item type or it is not clear what is required
skills and knowledge areas not relevant for the
from them, the elicited performance will not
exam. Washback can also be regarded as
reflect their ability.
positive if it brings about innovation in the
curriculum.

2 broad item types


1. Objective item types can also be called
recognition, selected or fixed response type, What can teaching affect?
and the candidate has to choose from several
The Washback Hypothesis claims that teachers
options. True/false, multiple choice, matching,
and learners do things they would not
ordering, gap-filling, one-word answers
necessarily otherwise do because of the test.
This means that not only good tests can have
2. Subjective item types, which are also known
positive effects, but bad ones as well if they
as constructed or extended response items,
result in teachers and learners doing things
require language production from the
they otherwise would not do (investing more
candidate. Compositions, essays, oral
energy into learning). Good tests can also have
interviews, information-gap activities.
negative effects if they have unfavourable
consequences. A number of studies reveal that
tests in fact have very little impact on what
actually happens in language classrooms.

The washback effect on language It seems that the first change that might occur

tests
as a result of language tests is change in the
content of teaching. It is much more difficult
Tests can have an impact on teachers, learners, for teachers to change their behavior and
the system of education, text book writers, attitudes.
and administrators as well as on the processes
of learning and teaching (Wall, 2000)
There are many factors that influence the
The effect is called backwash in general
washback effect:
education, but in testing literature, we prefer
the term washback.
• teacher ability
• teacher’s understanding of the test
• teaching style
• teachers’ background
• teachers’ attitudes towards • can write in an appropriate manner for a
innovation particular purpose and for a particular
• classroom conditions audience
• resources
• school management practices
• status of the subject in the curriculum
• general social and political context “Writing is the ability to develop thoughts in
writing and to convey them accurately,
• the role of publishers in material
effectively and appropriately with a particular
design audience and communicative purpose in mind.”
• the time elapsed since the
introduction of the exam

Construct
Practical considerations concerning the

relationship between tests and their


Students of a foreign language are usually
washback effect: expected to be able to produce a variety of text
types depending on their level of proficiency:
1. before introducing a new test, it is • low levels: forms, postcards, short
important to consult with those who are going personal notes, diary entries
to be affected by the test (e.g. teachers,
• intermediate level: personal letters,
students, administrators, policymakers).
applications, cv-s, formal letters,
instructions, descriptions, narratives,
2. the new test should be designed by taking brief reports, reviews
current educational practices and social
• advanced level: proposals, abstracts,
circumstances into consideration. summaries, extended reports,
academic essays
3. the new test should be comprehensible for
teachers, students, stakeholders, and
administrators and should be acceptable to
them. General principles of task design
• the best way to test students’ writing
4. teachers should be well-informed about the
ability is to make them write. Direct tests
construct, design, and content of the new test.
of writing ability are considered superior
to indirect tests in terms of both validity
5. policymakers and test designers should not
and washback
expect tests to have an immediate impact
• examinees cannot be expected to perform
well on tasks which are not part of their
L1 writing curriculum (“5-parapgraph-
essay”)
Testing skills • it is important to combine formal tasks
with informal, short ones with long ones
Testing writing skills and personal with academic topics
• each task should be independent from each
What can a person do who can write in a
other
foreign language?
• the task should not be too broad or too
general
• can apply the rules of spelling and
• it is difficult to overcome the interference
punctuation
• can write correct sentences by other abilities such as creativity,
• can link sentences into coherent paragraphs, imagination or reading
and paragraphs into texts • the best writing test tests only writing
• can select and organize information, can
exclude irrelevant information
• can think creatively and develop thoughts in
writing/ can use the language effectively
Providing feedback
Frequently used task types • It is important to use standard marking
symbols so that students can identify
• filling in form: filling in a simple form like what they did wrong and can learn from
a money order or an application for a their mistakes.
driver’s license or a university enrolment • Positive comments from the raters are at
form least as important as error correction.
• guided composition: visual prompts and • Raters should comment on the content as
guidelines (narrative, description) well as on the form of the paper.
• summary: students summarize a text
keeping the original style
• letters: guidelines about to who and about
what (informal/formal)
• report: to make sense of graphs and charts
and write up the information in them
Testing reading skills
Construct
Developing rating scales
“Reading comprehension means
1. Holistic scoring: It is rapid. If more scores understanding a written text, and
are used, it can be made very reliable too. extracting the information required to
Scorers might sink into an analytical mood, fulfil the reading purpose. "The ability to
which would result in inconsistent marking. read requires that the reader draw
Holistic scoring is not suitable for giving information from a text and combine it
informative feedback. with information and experience that the
2. Analytic scoring: The advantage of this reader already has" (Grabe and Stoller,
system is that, it makes scorers consider 2001. p. 188.).”
aspects they might otherwise ignore.
Frequently applied categories are: content,
vocabulary, grammar, organization, spelling
and punctuation. The disadvantage of this type
of scoring is that, it is time-consuming, and
that it might, divert the scorer's attention
Reading as a process and product
from the purpose of writing and thus reduce
the validity of the task. The process of reading is silent, internal and
private. Different readers engage in different
reading processes, however, the
Process of rating understandings, i.e. the products they end up
with tend to be similar. Language testers
would typically focus on the product of
• select criteria relevant to the required text
reading, which is problematic because what
type people remember of the text they have read is
• choose only as many aspects or categories affected by their memory skills.
as can be easily distinguished without any
overlap.
• use as many levels or bands as scorers can
distinguish with ease: no more than 7 is
Levels of understanding
recommended.
• write the descriptors first for the top and Reading comprehension can be classified
bottom bands, then halve the distance according to the levels of understanding as
between the two, and so on follows:
• validate the rating scale
• use the rating scale in the standards-
• reading the lines: to understand the text
setting process
literally
• train raters on the actual writing exam
• reading between the lines: infer meanings
samples
from texts
• both novice and experienced markers
• reading beyond the lines: approach and
should be trained
evaluate the text critically
• check rater reliability by making all raters
mark the same set of writing papers, those
who do not show up for rater training or
do not mark according to the standards
set by the standard-setting team should
be excluded from the rating process
Reading theories Reading as a complex, multi-level process

There are three influential approaches to People read for various purposes, and the
describing the reading process: purpose of reading determines the way we
read. In real life we use a variety of skills and
1. the bottom-up approach: moving from small subskills, which we select according to the
parts and building up a whole (c.f. putting reading purpose and the text type. Grabe (1991)
together a jigsaw puzzle when you do not have lists 6 skills and knowledge areas that, readers
a copy of the finished puzzle). Typically activate:
associated with behaviorism in the 1940s-
1950s. ln this approach the readers are "passive • automatic recognition skills
decoders of sequential graphic-phonemic- • vocabulary and structural knowledge
syntactic-semantic systems, in that order" • formal discourse structure knowledge
(Alderson 2000. p. 17). First, the graphic stimuli • content/world background knowledge
are recognized, and then they are decoded into • synthesis and evaluation skills/strategies
sounds. Through the sounds, the reader can • metacognitive knowledge and skills
recognize words and decode meanings. These monitoring (i.e. skimming, scanning,
sub-processes that the reader involves build on previewing, formulating questions about
one another unidirectionally, i.e. a later information, monitoring cognition,
component does not feedback into a preceding recognizing problems)
sub-process.

2. the top-down approach: It is also called the


conceptually driven approach. Emphasizes the
overall construction of meaning and draws on People develop their reading skills in the Ll. It
the reader's schemata or experiences (e.g. a is assumed that, L2 readers have to progress
foreigner figuring out the meaning of péntek, beyond a certain language threshold before
szombat, vasárnap in the calendar). Schemata their Ll reading abilities can transfer to the L2
are information networks that people score in situation. Alderson (2000) asserts that in
the brain, and according to schema theory, any second language reading the knowledge of the
new information is filtered through this second language is more important, than
network. In the top-down approach, the reader reading abilities in the first language.
is an active participant making use of existing
knowledge when predicting and constructing
the meaning of a reading text. Testing reading comprehension

Reading as a skill is often neglected both in


Recent research in L2 reading questions the teaching and in testing L2. Most language
usefulness of the top-down approach and testing tends to focus on productive language
emphasizes the contribution of bottom-up use, such as speaking and writing, and, as a
processing to fluent reading. Paran (1996) consequence, EFL speakers who land in an
points out that the more advanced a reader is, academic environment or in an EFL-demanding
the less he relies on guessing, context and
job realise with surprise that their receptive
background knowledge. "Good readers may
skills are "severely defective" (Lewis, 1993,
have a greater awareness of context - but they
do not need to use it while they are reading" p,43).
(p. 2B). According to Alderson (2000), both top-
down and bottom-up information is important
in reading, and the balance between the two Can-do statements
varies with text, reader and purpose. For language testers it, is useful to think about
the reading skill| in terms of the target situation
and define what exactly a reader at a certain
3. the interactive model: Current research
foreign language proficiency level can do:
generally views reading as an interactive,
• identify the situations in which the
socio-cognitive process (Bernhardt,, 1991, cited
candidates need to read/analyse those
in Ediger, 2001), which involves 3 participants:
a reader, a text and a social context, This situations in detail
means that, the reader constantly interacts • define language use in those situations in a
with the text in order to construct meaning, series of ‘can do’ statements.
and this interaction is influenced by his or her • design assessment instruments which reflect
past experiences, knowledge of the world, reading in such situations, and which allow
social and cultural context and purpose of the testers to see how well the candidate can
reading. read
Techniques and task types
knowledge makes a difference in
comprehending texts in specific topical
There is no single best method of testing domains; however, in the case of non-
reading abilities, and test designers need to specialist texts background knowledge has no
carefully match the technique to the setting, or only minimal influence.
the testing purpose, and the text type. Some
of the common task types are as follows: 3. Authenticity: Authentic texts are often
considered too difficult for low proficiency
• multiple choice comprehension questions: level comprehension tasks; however, it should
designing good distractors can be very be remembered that it is the task, not the text
difficult. that primarily determines the level of
• true or false questions: there is a 50% difficulty. The use of texts written for teaching
chance of guessing, which can be purposes or simplified readers should be
counterbalanced by increasing the number avoided, Authentic, or so-called 'edited-
of items or by adding a third, "not stated" authentic' texts should be used for testing
option. reading comprehension with tasks
• short answer questions appropriately aimed at the given proficiency
• matching two sets of stimuli: headings and level.
paragraphs, pictures, and paragraphs
• ordering tasks: reconstructing passages
scrambled sentences or paragraphs,
sequencing pictures according to the text. Testing listening skills
• information transfer: on the basis of the
text the candidates complete the Definition
information in a chart, table or graph.
• gap filling tests: words deleted from the
text on a rational basis (Attention must be The construct of listening must be defined for
paid to ensuring that filling the gaps does each and every test, as there is no such thing
require text comprehension!). as absolute validity.
• cloze tests: every nth word deleted
(integrated skills) People listen to others in a number of ways and
• editing test: improving the language and for a number of reasons: in order to get
organization of a text (integrated) information, to do something with the
information, and to interact with other people
and maintain social relations. Thus, there are

Problems also different types of language we listen to. It


is possible to distinguish between:
When designing reading tests, task designers
are faced with a number of dilemmas, such a§ 1. language for transactional purposes:
the following: message-oriented and is used for giving
instructions, giving information, directions, or
1. Memory: Should the test takers be allowed orders, inquiring, requesting, checking, etc.
to refer to the reading passage while solving 2. language for interactional purposes:
the reading tasks in order to reduce the role of person-oriented, and its main purpose is to
memory? Research shows that if students are establish and maintain social relationships.
required to answer low-Ievel, explicit
questions, they should be allowed to refer to
the text. If the task is to understand the main
idea, candidates have been found to perform In order to test the listening comprehension
better if the text is removed (Alderson, 2000), skills of language learners, the test designer
must, first clearly define what successful
2. Background knowledge: Should the effect candidates should be able to do. This is often
of background knowledge be considered? Grabe done by making 'can do' statements.
(2000) says that good readers are not
distinguished from poor readers by their use of
or reliance on context, although he admits
that in test situations good readers often
demonstrate better context use. It has also
been established by research that specialist
Features of spoken language Texts
Listening comprehension is influenced to a
great extent by various features of spoken
language, such as the following: A variety of texts can be used for listening
comprehension: monologues, dialogues, or
• spoken language is 'transient.' In real-life even multiparticipant, conversations, as long
situations, spoken language cannot be as they are extracts from oral discourse and
‘replayed.' not written discourse read out. A listening text
• chunking: spoken language can be broken can only be considered authentic if it carries
down into smaller chunks, ‘groups of the features of spoken language as listed in
words that, are more easily retained for section 2 above. Materials writers and test
comprehension than long sentences. designers sometimes edit original texts for
• redundancy: rephrasing, repetitions, educational purposes, but this can only be done
elaborations, fillers if the text preserves the characteristics of
• reduced forms: phonological (e.g. did he spoken discourse.
ldidi|), morphological (contracted forms),
pragmatic (e.g. Coming!). The source of the text can be anything that
• performance variables: slips, hesitations, matches the target language situation as
false starts, corrections described in the test specifications:
• colloquial expressions announcements, radio programmes, lectures,
• accents: native and non-native presentations, recorded discussions, or even
• rate of speech songs.
• intonation, rhythm and stress

Operations

• sound discrimination: Useful when all test


A number of difficulties may stem from the takers have the same Ll, and contrastive
above features: listeners mainly worry about analysis of the mother tongue and the
hearing a text only once and not being able to target language can be used (e.g for
go back to the beginning of a sentence, Hungarians: v-w, s/f-th, d-th). It must be
paragraph, or passage like they can do when added, however, that in real life it does not
reading a text. Another worrisome feature of matter too much since contextual c|ues
listening tests is that the sound is usually have been found to help more in
coming from a recording, which deprives the understanding verbal messages.
listener of seeing the speaker and prevents him • stress and intonation to recognize
or her from relying on lip reading, facial speaker's intention or attitude: command,
expressions, or gestures, which can all aid request, sarcasm, surprise, etc. ('You will
comprehension in live interaction. Also, if send me a dozen invitations.' 'Now that’s
teachers do not present new structures in a fine goalkeeper,')
their natural, contracted forms, students will • listening for gist: top-down, globa|
have difficulties in recognizing them in fluent understanding of the language.
speech. • listening for specific information:
Students listen to relatively long texts e.g.
There are other features of spoken language to get dates, names, facts, events or
that can actually help the L2 listener. For locations.
instance, if they know about the redundant • intensive listening: Students focus on
nature of spoken discourse, they will not panic certain elements of spoken language
when they do not understand a word or phrase (contracted forms, intonation,
at the first hearing because chances are that grammatical| forms).
the same information is going to come up
again, maybe in a slightly different form.
Stress and intonation can help the listener
select the important pieces of information
from the text flow.
Frequent techniques
interaction involves comprehension as well as
speech production.

Can-do statements
• short answer questions: candidates hear a
sentence or a short passage and re-
respond in no more than a few words to ln order to design a valid test or speaking
questions based on the text. If more ex- abilities, We must define what we expect the
tended answers are needed, it is advisable candidate to be able to perform in what kind
to use the candidates' Ll both in the of rea| life situations.
questions and in the answers. • The speaking test should then consist of
• Multiple choice items: Candidates hear a tasks that form a representative sample of
sentence or a short passage and respond to all the situations in which the candidate is
multiple-choice items based on the expected to function successfully.
information included in it. Longer texts • The tasks should elicit linguistic behavior
must be broken down into shorter chunks that truly represents the candidate's
to make sure comprehension and not ability.
memory is tested. • It should be possible to score performance
• Note-taking: This technique simulates validly and reliably (Hughes, 2003)
real-life situations in which students • The self-assessment scale of the common
listen to lectures and take notes. When reference levels in the CEF describes the
recording such a talk, the lecturer should speaking abilities of successful 82
be asked to talk from notes and not to candidates as follows:
read out a written script. A disadvantage • Spoken interaction: I can interact with a
of this task is that the standardized degree of fluency and spontaneity that
marking of the notes taken by the makes regular interaction with native
examinees can be difficult. speakers quite possible. I can take an
• Matching and sequencing; Candidates active part in discussion in familiar
match the words, phrases, or sentences contexts, accounting for and sustaining
they hear to pictures, match passages to my news.
headlines, or sequence the events of a • Spoken production: I can present clean,
story. Especially suitable for testing sound detailed descriptions on a wide range of
discrimination: e.g. ship-sheep, pen-pan, subjects related to my field of interest. l
tree-three, etc. can explain viewpoint, on a topical issue
• Information transfer: This technique is going the advantages and disadvantages of
suitable for testing the understanding of various options.
transactional language. Based on [the
information in the text, the candidates fill

Texts
in a chart or complete a table. Pauses in
the text are necessary to give time for the
candidates to do the Writing. There are various text, types and operations
• Gap frIling: This technique requires that, language learners can be expected to
intensive listening. Attention must be produce orally:
paid to ensuring that gaps (also called • Monologues: prepared or unprepared -
blanks) should not fall too close to each
rehearsed or spontaneous. e.g,:
other in the text and should concern
introductions, presentations, narratives,
content rather than structural elements
of the language. descriptions, instructions.
• Conversations and dialogues: with peer
or expert speaker (teacher, examiner).
• Discussions: bridging information or
opinion gaps, problem solving, exploring a
Testing oral skills topic.
• Interviews: with examiner or peer
Construct • Roleplays and simulations

The testing of speaking is hard to separate


from the testing of listening comprehension. A
candidate is assumed to have good oral
abilities if he or she is able to interact
successfully in the foreign language, and
Speaking tasks
gives instructions to his or her partner
about how to follow a route, perform a
series of actions, or draw a sketch.
Speaking tasks are activities that involve • Narrative tasks: based on picture stimuli
speakers "in using the language for the purpose the candidate narrates a sequence of
of achieving a particular goal or objective in a events or a story experience showing that
particular speaking situation" (Luoma, 2004, p. unrelated pictures can elicit more
31). language and more detail from the
candidates than ready-made cartoon
strips.
Speaking [asks can be grouped and classified in • Problem-solving tasks: This is ideally used
many different ways, e.g.: according to the as a paired task in which the candidates
number of participants (individual, pair, group) get the description of a problematic
or according to the communicative functions situation. They discuss the problem,
(socializing, giving and soliciting information, examine it from various angles, §suggest
expressing and asking for opinion, etc.). In the alternative solutions and finally come to a
following some frequently used task types are decision. A suitable task for testing oral
listed: argumentation skills, elicitation skills, and
conversation skills such as holding and
• Interview tasks: A very popular test task giving up the floor, turn-taking,
in which the examiner-interlocutor asks summarizing, etc.
questions to which the candidate responds • Role-play: This task is attractive but
to. A very serious drawback of this task seldom valid, it is intended to simulate
type might be the unequal power 'natural' language use, but candidates tend
relationship between the candidate and to resort to using poorly memorized
the examiner as a result of which the chunks and clichés. The task can be
candidate is likely to avoid taking the especially awkward if the examinees are
initiative at any point in the interview. asked to act out a role, they have never
Other problems might be caused by the experienced even in their mother tongue
examiner asking yes/no questions, or (e.g. buying a wedding gown or acting as
generally 'overtalking' the candidate. These the witness of a crime).
problems can be reduced to the minimum
if there is a written 'interlocutor frame',

Rules of thumb
which the examiner follows consistently.
• Description task: This can be an individual
or a paired task and can involve one or
more pictures. In the paired format the
examinees can get one picture each, which • Give candidates as many fresh starts as
they describe to one another to discover possible. (ln other words, the various tasks
similarities and differences. Another included in the speaking paper should be
possibility is to give one picture to one of independent of each other. This will lead to
the candidates and three or more very greater validity and reliability as well.)
similar pictures which differ only in minor • Set only tasks that would not be difficult
details from the second. On the basis of the for the test takers in their mother tongue.
description given by the first candidate, • Write and follow rubrics. Don't improvise.
the second selects which of the similar • Use a separate interlocutor and assessor
pictures has been described. (the tester should not be seen making
• Compare and contrast: This can also be notes).
either a paired or an individual task. The • Standardize interlocutor behaviour and
candidate can compare and contrast two provide training in it (use videotaped
pictures that are related to the same topic, exams).
e.g. life in the city vs. life in the country or • Remember that individual oral tests can be
people using different means of transport, particularly stressful.
etc. This task is similar to the description • Ensure reliable scoring: create a scale for
but involves some analysis as well, which scoring (holistic vs. analytic scales) and
requires the use of more complex train raters.
language.
• Instruction task: Ideally this task is
performed in pairs, One of the candidates
Scoring
c. course-books
d. schools
The scoring of a speaking test requires careful e. society?
planning in order to make it both valid and
3. What conditions does the effect of tests
reliable. It will be valid if it fits the construct
depend on?
and the aims of the test as defined in the test
specifications, and reliable if it ensures Tasks
consistent use. Rating scales are meant to help
the examiner to tell good performances from
poor ones. Both holistic and analytic scales are 1. Think back to one of the language
in use. proficiency exams you took during your course
of studies. How did it influence what and how

Rater training for speaking exams you were learning while you were preparing for
the exam? What do you think your teacher did
that he/she would otherwise not do while
Contrary to common belief, both holistic and preparing you for the exam?
analytic scales require that the examiners get
thorough training in using them. No matter 2. Choose one of the components of the new
how detailed and how clearly phrased a scale Hungarian school-leaving exam in English.
is, individual examiners will interpret it, Read the test specifications and the advice
differently unless they take part in examiner provided for teachers preparing their students
training. for the
exam(https://www.oktatas.hu/pub_bin/dload/
kozoktatas/erettsegi/vizsgakovetelmenyek201

Stages of a rater training session 7/elo_idegen_nyelv_vk.pdf). Discuss what


effect the introduction of this particular
1. An introduction to the test and the rating component of the exam might have on
criteria. language teaching in secondary school.
2. Illustration of the levels on the scale via
taped performance§ already rated by 3. You have a student who has asked you to
experienced raters or a team of experts. prepare him/her for the Key English Test
3. Clarification of certain aspects of the rating provided by Cambridge Exams Syndicate
scale as necessary. (www.cambridgeesol.org/exams/ket.htm).
4. Practice in applying the rating scale to a Read all the information that is available on
number of taped performances until each the exam and discuss what you would teach to
rater’s scores match the scores of the expert this student and how.
team.
5. Rater training is repeated before each take
of the exam, i.e. if an exam is administered 4 Writing skills
times a year, each exam period must be
preceded by a proper training session. Rater Questions
training is not merely for novice examiners:
even experienced examiners must take part in
it in order to ensure reliable scoring, 1. Brainstorm in groups: What type of feedback
have you been getting on your written work
during your university studies so far? Can you
recall any positive experiences?

Exercises I.
2 How can you ensure that a writing task only
tests writing ability and nothing else?
3 How can feedback on students’ writing lead
Washback effects to improvement?
4 What similarities and differences do you see
Questions between analytic and holistic scoring of
written performance?
5 How can a teacher match testing to teaching
1. What can tests have an impact on? if in teaching the focus should be on the
2. In what ways can language testing affect process of writing, whereas testing always
a. students focuses on the 'product'.
b. teachers
Reading skills
Tasks

1. This is the description of a B2 level writer


according to (some of) the CEF scales: "can Questions
write clear, detailed texts on a variety of
subjects related to his/her field of interest, 1. Is reading an active or a passive skill?
synthesizing and evaluating information and 2 When can you say that you understand a
arguments from a number of sources. Can text?
write clear, detailed descriptions of real or a. If you understand the meaning of
imaginary events and experiences, marking the each word in it.
relationship between ideas in clear connected b. lf you understand each sentence.
text, and following established conventions of c. lf you can fulfill the reading
the genre concerned. Can write clear, detailed purpose.
descriptions on a variety of subjects related to d. lf you can read it out with correct
his/her field of interest. Can write a review of pronunciation.
a film, book or play." (Council of Europe, 2004)” 3 What are some basic principles of selecting
In a similar fashion write up the exit texts for reading comprehension?
requirements for the 8th grade of a Hungarian 4 What does the difficulty of a reading task
school. depend on?

Tasks
2. Study the following list of criteria for
evaluating student writing. Rank order them l Create a graphic organizer to demonstrate
from very important to less important: your understanding of reading approaches.
- suitable title 2 Select a short article from the International
- Appropriate length Herald Tribune or any other easily accessible
- Neat and tidy handwriting English-language daily paper. Using a single
- Correct spelling text, prepare three reading comprehension
- Correct punctuation tasks, one for pre-intermediate, one for
- Clear communicative purpose - Original ideas intermediate, and one for advanced students.
- coherent text 3 A student complains about failing the
- well-structured sentences reading comprehension part of an exam,
- Appropriate use of paragraphs - good saying that it, did not only need knowledge of
cohesion of text English but also some familiarity with history
- Effective introduction and geography. Discuss if his or her complaint
- Effective conclusion can be justified?
- Correct grammar
- Wide range of vocabulary
- Task-appropriate style Oral skills
Questions

3. Discuss the following ideas in your group


and see if you can come to an agreement: What is the significance of giving candidates
a. Teachers should indicate all errors in an 'fresh starts' in a speaking test?
essay test. 2 Is it advantageous or disadvantageous for
b. Marking errors is a waste of time: students the examinee to have two examiners in the
never bother to correct them. speaking test: an interlocutor and an
c. peer correction does not work. assessor? Explain in detail.
d. teachers should indicate where there is an 3 What advantages and disadvantage§ might
error in a paper and leave the correction to paired oral exams have?
the student. 4 What are the possible consequences of
assessing the oral performance of a candidate
without a rating scale?
5 What is the role of rater training?
Tasks o pronunciation
o spelling
l Make a list of the target language use o grammatical info such as word
situations of an l8-year-old Hungarian high- class and countability
school leaver. o collocations
2 Compare the oral rating scale of the CEF to o frequency
that of the Hungarian school-Ieaving exam o pragmatic info such as style,
and analyze the differences. register
• We can test breadth of vocab and how well
they know the items (depth)

Listening skills
Dimensions in vocabulary testing
Questions

1 Why do many language learners think that 1. Discrete vs. embedded testing: first: separate
listening comprehension tasks are easier than component (e.g. multiple choice testing);
reading comprehension? Is this view embedded: integrated in other areas (vocab in
justifiable? an essay)
2 Should test takers be given some time to 2. Selective vs comprehensive: selective: students
read the tasks before they listen to the
are tested on a specific number of items;
recorded text?
comprehensive: overall use of vocab (speaking
3 Can texts for listening comprehension be
task)
written from imagination?
3. Context-independent vs. context-dependent:
first: out of context testing; second: contrary
Tasks

1.Record the news from an English-speaking Major issues


radio station and write two different types of
• How do we select the words to be tested?
tasks for the same text. Moderate the items
Existing vocabulary frequency lists should
in groups in class.
serve as a basis for selection.
• How do we assess depth? It is not enough
to be familiar with the core meaning.

Testing Vocabulary and Grammar • What is the role of context? Words can
have multiple meanings and can have
With the advent of communicative language different grammatical roles. If we include
testing, vocabulary and grammar are usually context, we also include other skills such
assessed in integrated tests. Vocabulary and as reading in a cloze test. Today there are
grammar are frequently evaluated as a less separate tests.
component of writing and speaking abilities,
but regarding and listening tests can also
include items measuring candidates’ lexical Tasks used for assessing vocabulary
and grammatical knowledge. Lexical and • multiple choice: best alternative;
grammatical competence are assumed to be ORIGO/Cambridge Proficiency
inseparable by number of researchers, which • finding synonyms (TOEFL)
assumption is also reflected in the current • identify words based on the definition
practice that these aspects of lg proficiency • providing definitions
are often tested in a combined test such as the • matching
UOE components. • completion items/fill-in
• sentence writing/example sentence

Vocabulary • sentence transformation

• Vocabulary does not only entail individual


words, but also phrases, idioms and
collocations.
• Vocabulary knowledge has the following
components:
o meaning
Assessing vocabulary in integrative tests
lg testing the meaning of grammatical
structures is also important, which cannot
• Vocabulary can also be an aspect of oral or be assessed with these types of tasks.
written performance. These tasks lack context and authenticity.
• CEF – describes vocabulary knowledge on In modern teaching, grammatical ability is
two scales: vocabulary range and tested in integrative manner in writing
vocabulary control and speaking (IELTS, Cambridge)
• Definition of grammatical ability? Should
it include lexical knowledge of cohesion
and coherence?
• What should be the standards
(British/American etc.)
• How can we test grammar in
decontextualized testing situations?

Tasks used for testing grammar


1. selected response (multiple choice, finding
errors)
2. limited production (cloze, gap-fill, sentence
transformation)
3. extended production tasks (longer texts
assessed)

Rating scales according to CEF


• levels of testing of vocabulary: text-based,
phase-based, sentence based
• choosing items: connection to unit, but
other skills are okay as well (writing:
words for cohesion)
• tip: a pile of words – create a cohesive text
--- using the words in context

What does grammatical ability mean?


• Grammatical ability is more than just
being familiar with grammatical
structures. It also entails the knowledge of
what grammatical structures mean. For
example, there is a difference in meaning
between the sentences “I wrote letters”
and “I have been writing letters”.
• Grammatical knowledge involves
knowledge of phonological/graphological Exercises II.
form and meaning: morpho-syntactic form
and meaning (e.g. -ed = past tense); Vocabulary and grammar
syntactic structures and their meaning
Questions
(e.g. word order rules and what they
express in terms of information focus)
1. What abilities does knowledge of grammar
and vocabulary involve? Can you find an
Major issues in testing grammar overlap between these two types of
knowledge?
• Should grammar be tested in discrete-
2. What kinds of tasks can be used for testing
point or in integrative tests? From the ‘60s
vocabulary and for assessing grammar? Which
until recently, multiple-choice, gap-filling,
tasks are suitable for both grammar and
true/false tasks were used. These tests are
vocabulary tests?
useful if grammar is highly important in
3. What are the reasons for the fact that a
the given situation, but in communicative
number of recent language tests do not
contain a vocabulary and grammar sub-
section? Response: it is already in the rating
scale, and it also appears in the skills (for
example in the listening in the Matura Exam
where you have to copy the exact words -it
also assesses vocabulary, spelling etc.)

Tasks

1. Devise a progress test to assess how well


your students have acquired the distinction
between the grammatical structure of past
simple and continuous. Include a selected
response, a limited response and an extended
production. Specify how would you score.
2. Design a vocab test for adult learners to test
how well they have acquired 20 words. The test
should also measure the depth of knowledge.

What characterizes a good test?


- great length
- connect to the unit
- appropriate for the level
- for example: when you test
listening do not test solely
vocab
- authentic/semi-authentic
- listening: it is not a good idea
to include long items because
we do not test reading or
writing

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy