2-Testing Listening

12 Testing listening
It may seem rather odd to test listening separately from speaking, since
the two skills are typically exercised together in oral interaction. However,
there are occasions, such as listening to the radio, podcasts, listening
to lectures, online talks and tutorials, or listening to railway station
announcements, when no speaking is called for. Also, as far as testing
is concerned, there may be situations where the testing of oral ability is
considered, for one reason or another, impractical, but where a test of
listening is included for its backwash effect on the development of oral
skills. Listening may also be tested for diagnostic purposes.
Because it is a receptive skill, the testing of listening parallels in most
ways the testing of reading. This chapter will therefore spend little time
on issues common to the testing of the two skills and will concentrate
more on matters that are particular to listening. The reader who plans
to construct a listening test is advised to read both this and the previous
chapter.
The special problems in constructing listening tests arise out of the
transient nature of the spoken language. Listeners cannot usually move
backwards and forwards over what is being said in the way that they can a
written text. The one apparent exception to this, when an audio-recording
is put at the listener’s disposal, does not represent a typical listening task
for most people. Ways of dealing with these problems are discussed later
in the chapter.
Specifying what the candidate should be able to do

As with the other skills, the specifications for reading tests should say what
it is that candidates should be able to do.
Content
Operations
Some operations may be classified as global, inasmuch as they depend on
an overall grasp of what is listened to. They include the ability to:
• obtain the gist;
• follow an argument;
• recognise the attitude of the speaker.
163
https://doi.org/10.1017/9781009024723.012 Published online by Cambridge University Press
Other operations may be classified in the same way as were speaking skills
in Chapter 10. In writing specifications, it is worth adding to each operation

whether what is to be understood is explicitly stated or only implied.
Informational:
• obtain factual information
• follow instructions (including directions)
• understand requests for information
• understand expressions of need
• understand requests for help
• understand requests for permission
• understand apologies
• follow sequence of events (narration)
• recognise and understand opinions
• follow justification of opinions
• understand comparisons
• recognise and understand suggestions
• recognise and understand comments
• recognise and understand excuses
• recognise and understand expressions of preferences
• recognise and understand complaints
• recognise and understand speculation
Interactional:
• understand greetings and introductions
• understand expressions of agreement
• understand expressions of disagreement
• recognise speaker’s purpose
• recognise indications of uncertainty
• understand requests for clarification
• recognise requests for clarification
• recognise requests for opinion
• recognise indications of understanding
• recognise indications of failure to understand
164
• recognise and understand corrections by speaker (of self and others)

• recognise and understand modifications of statements and comments
• recognise speaker’s desire that listener indicate understanding
• recognise when speaker justifies or supports statements, etc. of other
speaker(s)
• recognise when speaker questions assertions made by other speakers
• recognise attempts to persuade others
It may also be thought worthwhile testing lower-level listening skills in a

diagnostic test, since problems with these tend to persist longer than they
do in reading. These might include:
• discriminate between vowel phonemes
• discriminate between consonant phonemes
• interpret intonation patterns (recognition of sarcasm, questions in
declarative form, etc., interpretation of sentence stress)
• interpret non-verbal information (e.g. facial expressions, gesture)
Texts
For reasons of content validity and backwash, texts should be specified as
fully as possible.
Text type might be first specified as monologue, dialogue, or multi-
participant, and further specified: conversation, announcement, talk or
lecture, instructions, directions, etc.
Text forms include: description, exposition, argumentation, instruction,
narration.
Length may be expressed in seconds or minutes. The extent of short utterances
or exchanges may be specified in terms of the number of turns taken.
Speed of speech may be expressed as words per minute (wpm) or syllables
per second (sps). Reported average speeds for samples of British English are:
WPM SPS
Radio monologues 160 4.17
Conversations 210 4.33

Interviews 190 4.17
Lectures to non-native speakers 140 3.17
(Tauroza and Allison 1990)

Dialects may include standard or non-standard varieties.
165
Accents may be regional or non-regional.
If authenticity is called for, the speech should contain such natural features
as assimilation and elision (which tend to increase with speed of delivery)
and hesitation phenomena (pauses, fillers, etc.).
Intended audience, style, topics, range of grammar and vocabulary may be
indicated.
Increasingly, test developers are incorporating video and other visual
information into listening tests. In terms of authenticity this has benefits.
Although there are situations, such as listening to the radio, or to airport
announcements, where we rely purely on verbal information, these
are not the most common. Even traditional ‘voice only’ phone calls are
increasingly being replaced with video calls. In most real-life situations
we not only listen, but receive other, non-verbal, information, such as
mouth movements, facial expressions, body language or even visual
aids. Therefore, tests which contain visual as well as audio information
are arguably a better representation of authentic listening. Where visual
information is to be included in items, it should of course be included in
the test specifications, as in the operations listed above.
Setting criterial levels of performance

The remarks made in the chapter on testing reading apply equally here. If
the test is set at an appropriate level, then, as with reading, a near perfect
set of responses may be required for a ‘pass’. ACTFL, ILR or other scales,
including those based on CEFR, may be used to validate the criterial levels
that are set.
Setting the tasks

Selecting samples of speech (texts)
Passages must be chosen with the test specifications in mind. If we are
interested in how candidates can cope with language intended for expert
speakers, then ideally we should use samples of authentic speech. These
can usually be readily found. Possible sources are podcasts, online lectures,
radio, television, teaching materials, and our own recordings of expert
speakers. If, on the other hand, we want to know whether candidates
can understand language that may be addressed to them as non-expert
speakers, suitable examples can be obtained from teaching materials and
recordings of expert speakers that we can make ourselves. In some cases
the indifferent quality of the recording may necessitate re-recording. It
seems to us, although not everyone would agree, that a poor recording
introduces difficulties additional to the ones that we want to create, and so
reduces the validity of the test. It may also introduce unreliability, since
166
the performance of individuals may be affected by the recording faults in

different degrees from occasion to occasion. If details of what is said on
the recording interfere with the writing of good items, testers should feel
able to edit the recording, or to make a fresh recording from the amended
transcript. In some cases, a recording may be used simply as the basis for a
‘live’ presentation.
If recordings are made especially for the test, then care must be taken
to make them as natural as possible. There is typically a fair amount of
redundancy in spoken language: people are likely to paraphrase what
they have already said (‘What I mean to say is ...’), and to remove this
redundancy is to make the listening task unnatural. In particular, we
should avoid passages originally intended for reading.
Test writers should be wary of trying to create spoken English out of their
imagination: it is better to base the passage on a genuine recording, or a
transcript of one. If an authentic text is altered, it is wise to check with
expert speakers that it still sounds natural. If a recording is made, care
should be taken to ensure that it fits with the specifications in terms of
speed of delivery, style, etc.
Suitable passages may be of various lengths, depending on what is being
tested. A passage lasting ten minutes or more might be needed to test
the ability to follow an academic lecture, while twenty seconds could be
sufficient to give a set of directions.
Writing items
For extended listening, such as a lecture, a useful first step is to listen to
the passage and note down what it is that candidates should be able to get
from the passage. We can then attempt to write items that check whether
or not they have got what they should be able to get. This note-making
procedure will not normally be necessary for shorter passages, which will
have been chosen (or constructed) to test particular abilities.
In testing extended listening, it is essential to keep items sufficiently far
apart in the passage. If two items are close to each other, candidates may
miss the second of them through no fault of their own, and the effect of
this on subsequent items can be disastrous, with candidates listening for
‘answers’ that have already passed. Since a single faulty item can have
such an effect, it is particularly important to trial extended listening tests,
even if only on colleagues aware of the potential problems.
Candidates should be warned by key words that appear both in the item
and in the passage that the information called for is about to be heard.
For example, an item may ask about ‘the second point that the speaker
makes’ and candidates will hear ‘My second point is … ’. The wording
does not have to be identical, but candidates should be given fair warning
in the passage. It would be wrong, for instance, to ask about ‘what the
167
speaker regards as her most important point’ when the speaker makes the
point and only afterwards refers to it as the most important. Less obvious
examples should be revealed through trialling.
Other than in exceptional circumstances (such as when the candidates are
required to take notes on a lecture without knowing what the items will
be, see below), candidates should be given sufficient time at the outset to
familiarise themselves with the items. As was suggested for reading in the
previous chapter, there seems no sound reason not to write items and accept
responses in the native language of the candidates. This will in fact often be
what would happen in the real world, when a fellow native speaker asks for
information that we have to listen for in the foreign language.
Possible techniques
Multiple choice
The advantages and disadvantages of using multiple choice in extended
listening tests are similar to those identified for reading tests in the
previous chapter. In addition, however, there is the problem of the
candidates having to hold in their heads four or more alternatives while
listening to the passage and, after responding to one item, of taking in and
retaining the alternatives for the next item. If multiple choice is to be used,
then the alternatives must be kept short and simple. The alternatives in the
following invented example item are too complex.
Before beginning a journey by car, what is the motorist advised to do?
a. He should increase the pressure in his tyres to the required level.
b. He should connect his sat nav and enter his intended destination.
c. He should make sure that the vehicle is fully roadworthy.
d. He should ensure that all doors are properly closed, with child locks
activated.
Better examples would be:

(Understanding request for help)
I don’t suppose you could show me where this goes, could you? Response:
a. No, I don’t suppose so.
b. Of course I can.
c. I suppose it won’t go.
d. Not at all.
(Recognising and understanding suggestions)

I’ve been thinking. Why don’t we call Charlie and ask for his opinion?
168
Response:

a. Why is this his opinion?
b. Why do you want to do that?
c. You think it’s his opinion?
d. Do you think Charlie has called?
Multiple choice can work well for testing lower-level skills, such as
phoneme discrimination.
The candidate hears bat
and chooses between pat mat fat bat
Short answer
This technique can work well, provided that the question is short and
straightforward, and the correct, preferably unique, response is obvious.
Below is an example from the IELTS test. The candidates hear an extract
from a talk given to a group who are going to stay in the UK. Note that the
candidates need only give two examples of community groups, with theatre
Listening
providedsample
as antask – Short-answer questions (to be used with IELTS Listening Recording 3)
example.
SECTION 2
Questions 11 – 16
Answer the questions below.
Write NO MORE THAN THREE WORDS AND/OR A NUMBER for each answer.
What TWO factors can make social contact in a foreign country difficult?
• 11 ...............................
• 12 ...............................
Which types of community group does the speaker give examples of?
• theatre
• 13 ..................................
• 14 ..................................
In which TWO places can information about community activities be found?
• 15 ..................................
• 16 ..................................
169
Gap filling
This technique can work well where a short answer question with a
unique answer is not possible.
Woman: Do you think you can give me a hand with this?
Man: I’d love to help but I’ve got to go round to my mother’s in a minute.
The woman asks the man if he can her but he has to
visit his .
Information transfer
This technique is as useful in testing listening as it is in testing reading,
since it makes minimal demands on productive skills. It can involve
such activities as the labelling of diagrams or pictures, completing forms,
making diary entries, or showing routes on a map. In the following
example, which is taken from the IELTS exam, candidates label a map
Listening sample task – Plan/map/diagram labelling
while listening to someone describing the layout of a library.
SECTION 2
Questions 11-15
Label the plan below.
Choose FIVE answers from the box and write the correct letters A-I next to questions
11-15.
Town Library
Seminar room
14 ………......... A Art collection
B Children's books
15 …......... C Computers
Non-fiction
13 ……….........
D Local history
collection
Fiction
Library area
E Meeting room
F Multimedia
12 ………......... G Periodicals
11 …….........
Library office H Reference books
I Tourist
information
Librarian’s desk
Entrance
170
Tapescript
(Note: There is no Listening recording for this tapescript.)
You will hear the librarian of a new town library talking to a group of people who are
visiting the library.
OK everyone. So here we are at the entrance to the town library. My name is Ann,
and I'm the chief librarian here, and you'll usually find me at the desk just by the main
entrance here. So I'd like to tell you a bit about the way the library is organised, and
what you'll find where … and you should all have a plan in front of you. Well, as you
see my desk is just on your right as you go in, and opposite this the first room on
your left has an excellent collection of reference books and is also a place where
people can read or study peacefully. Just beyond the librarian's desk on the right is a
room where we have up to date periodicals such as newspapers and magazines and
this room also has a photocopier in case you want to copy any of the articles. If you
carry straight on you'll come into a large room and this is the main library area. There
is fiction in the shelves on the left, and non-fiction materials on your right, and on the
shelves on the far wall there is an excellent collection of books relating to local
history. We're hoping to add a section on local tourist attractions too, later in the year.
Through the far door in the library just past the fiction shelves is a seminar room, and
that can be booked for meetings or talks, and next door to that is the children's
library, which has a good collection of stories and picture books for the under
elevens. Then there's a large room to the right of the library area – that's the
multimedia collection, where you can borrow videos and DVDs and so on, and we
also have CD-Roms you can borrow to use on your computer at home. It was
originally the art collection but that's been moved to another building. And that's
about it – oh, there's also the Library Office, on the left of the librarian's desk. OK,
now does anyone have any questions?
Note taking
Where the ability to take notes while listening to, say, a lecture is in
question, this activity can be quite realistically replicated in the testing
situation. Candidates take notes during the talk, and only after the talk
is finished do they see the items to which they have to respond. When
constructing such a test, it is essential to use a passage from which notes
can be taken successfully. This will only become clear when the task is
first attempted by test writers. We believe it is better to have items (which
can be scored easily) rather than attempt to score the notes, which is not a
task that is likely to be performed reliably. Items should be written that are
perfectly straightforward for someone who has taken appropriate notes. In
order to aid authenticity in academic contexts, candidates may be supplied
with a copy of the slides used in the lecture. This allows them to make
notes on the slides, as they commonly would in their future studies.
It is essential when including note taking as part of a listening test that
careful moderation and, if possible, trialling should take place. Otherwise,
items are likely to be included that even highly competent speakers of the
language do not respond to correctly. It should go without saying that,
since this is a testing task which might otherwise be unfamiliar, potential
171
candidates should be made aware of its existence and, if possible, be
provided with practice materials. If this is not done, then the performance
of many candidates will lead us to underestimate their ability.
Partial dictation
While dictation may not be a particularly authentic listening activity
(although in lectures at university, for instance, there is often a certain
amount of dictation), it can be useful as a testing technique. As well as
providing a ‘rough and ready’ measure of listening ability, it can also
be used diagnostically to test students’ ability to cope with particular
difficulties (such as weak forms in English).
Because a traditional dictation is so difficult to score reliably, it is
recommended that partial dictation is used, where part of what the
candidates hear is already written down for them. It takes the following
form:
The candidate sees:
When I someone for the first time,
I them my name. and I always shake their
hand. I think the polite thing to do. I often
nervous when I meet new people so
I play with my hair. I wish I didn’t do that.
What do I usually about? The weather and
. But I don’t talk about .
That’s rude!
The tester reads:
When I meet someone for the first time, I tell them my name and I
always shake their hand. I think that’s the polite thing to do. I often feel
nervous when I meet new people so I sometimes play with my hair. I
wish I didn’t do that. What do I usually talk about? The weather and
jobs. But I don’t talk about money. That’s just rude!
Testers can either write their own passages or they can use authentic
transcripts, either from online resources or from student coursebooks,
as with the example above. There are advantages to using coursebooks.
In addition to the practical benefit of having an audio recording to use,
the excerpts from coursebooks will have been written for specific levels
of language ability. The possible disadvantage is that some candidates
may already be aware of the coursebook. Therefore, we recommend
coursebook excerpts only be used in classroom tests. For higher-stakes
tests, we suggest it is preferable to use one of the many online resources
of authentic listening samples, some of which are listed at the end of
this chapter.
Since it is listening that is meant to be tested, correct spelling should
probably not be required for a response to be scored as correct. However,
172
it is not enough for candidates simply to attempt a representation of the

sounds that they hear, without making sense of those sounds. To be scored
as correct, a response has to provide strong evidence of the candidate’s
having heard and recognised the missing word, even if they cannot spell it.
It has to be admitted that this can cause scoring problems.
The gaps may be longer than one word:
When I meet someone , I tell them my name and I
always shake their hand.
While this has the advantage of requiring the candidate to do more
than listen for a single word, it does make the scoring (even) less
straightforward.
Transcription
Candidates may be asked to transcribe numbers or words which are
spelled letter by letter. The numbers may make up a telephone number.
The letters should make up a name or a word which the candidates should
not already be able to spell. The skill that items of this kind test belong
directly to the ‘real world’. In the trialling of a test we were involved with
recently, it was surprising how many teachers of English were unable to
perform such tasks satisfactorily. A reliable and, we believe, valid way of
scoring transcription is to require the response to an item to be entirely
correct for a point to be awarded.
Moderating the items

The moderation of listening items is essential. Ideally it should be carried
out using the already prepared recordings or with the item writer reading
the text as it is meant to be spoken in the test. The moderators begin by
‘taking’ the test and then analyse their items and their reactions to them.
The moderation checklist given on page 156 for reading items needs only
minor modifications in order to be used for moderating listening items.
Presenting the texts (live or recorded?)

The great advantage of using recordings when administering a listening
test is that there is uniformity in what is presented to the candidates.
This is fine if the recording is to be listened to in a well-maintained
language laboratory or in a room with good acoustic qualities and with
suitable equipment (the recording should be equally clear in all parts of
the room). If these conditions do not obtain, then a live presentation is to
be preferred. If presentations are to be live, then the greatest uniformity
(and so reliability) will be achieved if there is just a single speaker for
each (part of a) test. If the test is being administered at the same time in a
number of rooms, more than one speaker will be called for. In either case,
a recording should be made of the presentation, with which speakers can
be trained, so that the intended emphases, timing, etc. will be observed
173
with consistency. Needless to say, speakers should have a good command
of the language of the test and be generally highly reliable, responsible and
trustworthy individuals.
Scoring the listening test

It is probably worth mentioning again that in scoring a test of a receptive
skill there is no reason to deduct points for errors of grammar or spelling,
provided that it is clear that the correct response was intended.
READER ACTIVITIES
1. a. Choose an online video lecture that would be appropriate for a group of
students with whom you are familiar (see end of this chapter for possible
resources). Play a five-minute stretch to yourself and take notes. On the
basis of the notes, construct eight short-answer items. Ask colleagues
to take the test and comment on it. Amend the test as necessary, and
administer it without video (audio only) to half of the group of students
you had in mind. Analyse the results.
b. Administer the same test to the other half of the group, showing them the
video as well as the audio. What differences do you notice between the
performance of the two groups of students? Go through the test item by
item with the students and ask for their comments. How far, and how well,
is each item testing what you thought it would test?
2. Design short items that attempt to discover whether candidates can
recognise: sarcasm, surprise, boredom, elation. Try these on colleagues and
students.
3. Design a test that requires candidates to draw (or complete) simple
pictures. Decide exactly what the test is measuring. Think what other things
could be measured using this or similar techniques. Administer the test and
see if the students agree with you about what is being measured.
FURTHER READING
General
Buck (2001) is a thorough study of the assessment of listening. Field (2019)
evaluates many of the conventions behind listening tests and provides
practical ideas for how they might be rethought.
Test methods
Sherman (1997) examines the effects of candidates previewing listening
test items. Buck and Tatsuoka (1998) analyse performance on short-answer
items. Hale and Courtney (1994) look at the effects of note taking on
performance on TOEFL® listening items. Note taking is suggested to be
a good indicator of listening ability in Song (2012). Shohamy and Inbar
(1991) look at the effects of texts and question type. Cai (2013) examines
the validity of partial dictation as a test of ‘higher order’ listening abilities.
The effects of visual information in listening tests are investigated in Ginther
(2002), Ockey (2007), Wagner (2010) and Batty (2015).
174
Test validation

Buck (1991) uses introspection in the validation of a listening test.
Optimising test performance

Arnold (2000) shows how performance on a listening test can be improved
by reducing stress in those who take it.
Texts
Freedle and Kostin (1999) investigate the importance of the text in TOEFL®
minitalk items. Examples of recordings in English that might be used as the
basis of listening tests are Crystal and Davy (1975); Hughes et al. (2012),
if regional British accents are relevant. Harding (2012) investigates the
possibility of bias where accents of speakers in recordings are similar to
those of the test-takers’ L1. Ockey and Wagner (2018) is a collection of
articles on authenticity in the assessment of listening ability.
Online resources
There are countless online resources of authentic spoken English, which
testers can use to create tests. What follows is a brief selection of resources
that can easily be found using a search engine. The Self-access centre
for Language Learning at the University of Reading provides dozens of
authentic academic lectures. TED has thousands of talks and lectures on
every subject imaginable. Transcripts can be accessed through the TED
website. Podcasts are another good way to use authentic listening samples
in tests. The BBC website contains hundreds of podcasts in different genres.
175

2-Testing Listening

Uploaded by

Copyright:

Available Formats

2-Testing Listening

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2-Testing Listening

Uploaded by

Copyright:

Available Formats

12 Testing listening

Specifying what the candidate should be able to do

in Chapter 10. In writing speciﬁcations, it is worth adding to each operation

12 Testing listening

It may also be thought worthwhile testing lower-level listening skills in a

Conversations 210 4.33

(Tauroza and Allison 1990)

Setting criterial levels of performance

Setting the tasks

12 Testing listening

Better examples would be:

(Recognising and understanding suggestions)

12 Testing listening

Answer the questions below.

In which TWO places can information about community activities be found?

Label the plan below.

(Note: There is no Listening recording for this tapescript.)

12 Testing listening

Moderating the items

Presenting the texts (live or recorded?)

Scoring the listening test

12 Testing listening

Optimising test performance

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.