Kaindl, 2004
Kaindl, 2004
Kaindl, 2004
document design companion series accompanies the journal document design and focuses
on internal and external communication of medium sized to multinational corporations,
governmental bodies, non-profit organizations, as well as media, health care, educational
and legal institutions, etc.
The series promotes works that combine aspects of (electronic) discourse — written, spoken and
visual — with aspects of text quality (function, institutional setting, culture). They are problem
driven, methodologically innovative, and focused on effectivity of communication. All
manuscripts are peer reviewed.
document design is ‘designed’ for: information managers, researchers in discourse studies and
organization studies, text analysts, and communication specialists.
Editors
Jan Renkema Maria Laura Pardo Ruth Wodak
Tilburg University University of Buenos Aires Austrian Academy of Sciences
Editorial Address
Jan Renkema
Tilburg University
Discourse Studies Group
P.O.Box 90153
NL 5000 LE TILBURG
The Netherlands
E-mail: J.Renkema@uvt.nl
Volume 6
Perspectives on Multimodality
Edited by Eija Ventola, Cassily Charles and Martin Kaltenbacher
Perspectives on Multimodality
Edited by
Eija Ventola
Cassily Charles
Martin Kaltenbacher
University of Salzburg
P99.4.M6P47 2004
401’.41--dc22 2004046238
isbn 90 272 3206 7 (Eur.) / 1 58811 595 X (US) (Hb; alk. paper)
Contributors vii
Acknowledgements ix
Introduction 1
Eija Ventola, Cassily Charles and Martin Kaltenbacher
Chapter 1
In between modes: Language and image in printed media 9
Hartmut Stöckl
Chapter 2
Modelling multiple semiotic systems: The case of gesture and speech 31
Peter Muntigl
Chapter 3
Problematising ‘semiotic resource’ 51
Victor Lim Fei
Chapter 4
Multimodality and empiricism: Preparing for a corpus-based approach
to the study of multimodal meaning-making 65
John Bateman, Judy Delin and Renate Henschel
Chapter 5
On the effectiveness of mathematics 91
Kay L. O’Halloran
Chapter 6
Multimodality in language teaching CD-ROMs 119
Martin Kaltenbacher
Chapter 7
The multiple modes of Dirty Dancing: A cultural studies approach
to multimodal discourse analysis 137
Markus Rheindorf
Chapter 8
Multimodal text analysis and subtitling 153
Christopher Taylor
Chapter 9
Multimodality in the translation of humour in comics 173
Klaus Kaindl
Chapter 10
Multimodality in operation: Language and picture in a museum 193
Andrea Hofinger and Eija Ventola
Chapter 11
Drawing on theories of inter-semiotic layering to analyse multimodality
in medical self-counselling texts and hypertexts 211
Eva Martha Eckkrammer
Chapter 12
On the multimodality of interpreting in medical briefings for informed
consent: Using diagrams to impart knowledge 227
Kristin Bührig
Index 243
Contributors
For a growing number of researchers of text and discourse it has over the last
decades become increasingly evident, that in the pursuit of understanding com-
munication patterns around us, the analysis of language alone is not enough. The
media of communication or, more accurately, the media which provide analysts
with an object of study have for the last several millennia permitted a more or
less comfortable excision of language, as text/discourse, from its context. Similarly,
it has been relatively easy to analyse language and its instantiation in discourse
separately from other forms of meaning-making, such as the gestures and facial
expressions of a conversation, the illustrations and print of a document, the mu-
sic and lighting of a drama. The first displacement, that of text from context, has
been addressed, or redressed, by both applied and theoretical linguists in many
traditions and for many purposes, particularly in the last half-century. This has
been done either through building social context into the fundamental structure
of linguistic theory (e.g. social semiotic theories of language), or through includ-
ing contextual observation as a supplement to ‘language-only’ models. However,
the second displacement, that of language from other kinds of meaning-making,
has only recently begun to be addressed by linguists.
Although multimodality and multimediality, when seen as combinations of
writing, speaking, visualisation, sounds, music, etc., have always been omnipresent
in most of the communicative contexts in which humans engage, they have for a
long time been ignored, as various academic disciplines have pursued their own re-
search agendas as research fields. Thus, it is relatively recent that the developments
of the various possibilities of combining communication modes in the ‘new’ me-
dia, like the computer and the Internet, have forced scholars to think about the
particular characteristics of these modes and the way they semiotically function
and combine in the modern discourse worlds.
Changes in the technology of communication are forcing the issue for those
who would understand the mediation of our lives by discourse. Multimodality, the
Eija Ventola, Cassily Charles and Martin Kaltenbacher
ing a problem. During the early Renaissance mathematical problems were often
visualised through drawn pictures and diagrams to demonstrate the problem but
also to engage the students. In the 16th century abstract semiotic metaphors, like
lines visualising the path of flying objects, established a significant turning point in
mathematics. Symbols representing physical entities became more frequent partic-
ularly with the work of Descartes and Newton. Descartes used algebraic formulae
for decontextualised representations of curves, and Newton defined scientific phe-
nomena completely through the symbolism of algebraic equations. Modern math-
ematics is based on complex inter-semiotic relations between language, visual
image (graphs, diagrams) and symbolism (algebraic formulae).
In Chapter 6 Martin Kaltenbacher looks into the semiotics of English language
teaching CD-Roms and explores which demands text-image combinations have to
fulfil in order to provide positive effects for the learners. He claims that many
products combine different semiotic modes in a way that may inhibit rather than
foster understanding and learning. He analyses the integration of sound waves in
so-called pronunciation labs, the semantics of text-picture combinations, and the
use of short video clips demonstrating the articulation of model speakers. All visu-
alisations must meet some minimal semiotic requirements: they have to be easily
interpretable, they must help the learners disambiguate the meanings to be learnt,
and they have to be exact representations of the structures taught. These require-
ments are often not met, as many visualisations are too general or too ambiguous.
As a solution to this, Kaltenbacher proposes replacing complex visualisations with
more discrete ones, such as icons.
In Chapter 7 Markus Rheindorf investigates the relations of specific and non-
specific modes in the film Dirty Dancing. He explores the questions whether the
distribution of the modes is significant to the genre dance film, how the modes
combine to realise generic structure, and how the genre can be topologically re-
lated to other genres. Rheindorf argues that the protagonists of the film construe
their class as well as their gender specific identities through the mode dance.
Other modes, like dress code and music, support the semiotic functions of dance.
Analysing a number of crucial scenes, Rheindorf suggests that set filmic phrases
share common patterns and distributions of modes, which keep occurring in cer-
tain typical contexts with similar content. Additionally, he argues that the salience
of certain filmic scenes is enhanced through a strategic change in their multimodal
configurations.
In Chapter 8 Christopher Taylor looks at the role multimodal text analysis can
play in the process of subtitling a film in a foreign language. A particular diffi-
culty in this process arises when the spoken text contains a word play which is
specific to the source language and is made visible in the film. Here the meaning
encoded in the different modes may be lost in the translation process, unless the
Introduction
elements are combined in a text), and fusion (image and verbal text fuse into a
new textual form). In addition to that, Eckrammer argues that hypertexts should
foster multimodality due to their non-linear structure, although in reality they are
conservatively verbal.
In the final chapter, Chapter 12, Kristin Bührig investigates how visual mate-
rials, such as diagrams and charts, are integrated in the interpretation process at
hospitals, where non-native speakers are briefed about their pending operations
by German doctors with the help of a non-professional interpreter. One major ob-
stacle in this type of communication is the linguistic barrier established through
the use versus the lack of specialist language and expert knowledge by the parties
involved. Bührig focuses on the different discursive roles a diagram plays in the
doctor’s attempt to communicate knowledge about the operation and the inter-
preter’s attempt to pass on this knowledge in the target language. She finds that the
professional doctor uses the visuals systematically to build up the patient’s medi-
cal knowledge about his diagnosis and for refocusing the thematic elements in his
discourse. The interpreter faces problems in transferring the medical information
into the target language and uses the diagram to support the new information in
the rhematic parts of the translation.
The impetus for this book were discussions which took place during the First
International Symposium on Multimodal Discourse at the University of Salzburg
in 2002, organised by the editors. We hope that this volume will be a means of
opening up the discussion to others who will bring their interest and contribution
to the rich and growing area of multimodal discourse.
Multimodal issues
Chapter 1
In between modes
Language and image in printed media
Hartmut Stöckl
Technical University Chemnitz, Germany
modes ought not be confused but neatly kept apart and regarded in their inter-
dependences. Thirdly, and perhaps most importantly, the range of existing modes
represents a hierarchically structured and networked system, in which any one
mode can be seen to fall into sub-modes which in their turn consist of distinct
features that make up the sub-mode. Let me exemplify these points looking at the
image as another major signing mode.
Images, just like language, are not purely visual, they also have a tactile quality
to them, which may be reflected in the meaning we construe from them. So while
essentially visual, the material and techniques used in the production of the pic-
In between modes
ture are also subject to touch. The nature of the pictorial sign can, however, not
be gauged from its visual quality alone because this also pertains to written lan-
guage, as we have seen. In this respect both image and language are equally visual.
Hartmut Stöckl
Images are realised in different media, the static printed image and the dynamic
moving image of film and television being the most prominent examples. Besides
sharing some sub-modes (e.g. elements, vectors, distance, angle, colour etc.), both
medial variants also differ in the set of sub-modes they entail. So the moving im-
age commands such specific sub-modes as, for instance, panning or tilting, it can
have narrative7 and has at its disposal sub-modes originating in post-production
like cut or visual effects (e.g. slow-motion, time-laps). Sub-modes in their turn
can then be seen to comprise sets of distinctive features, i.e. specific aspects of one
sub-mode, which are both phenomena in perception contributing to an overall
gestalt as well as analytical categories which help to theoretically come to grips
with sub-modes. Colour, for instance, can be decomposed into six distinctive fea-
tures: value, saturation, purity, modulation, differentiation and hue (Kress & van
Leeuwen 2002: 354ff.). Any concrete colour may then be specified by determining
values on the scales of those distinctive features and their individual combination.
Applying this kind of systematisation of mode to textual genres in printed
media and TV- or film-media, I suggest the following schematically represented
network (cf. Figures 1 and 2).
The graphic representations of how modes are structured (cf. Figures 1 and
2) are necessarily formalised simplifications, which demand some comments on
inherent problems and limitations.
1. While the columns of core modes contain central sign-repertoires that are
deeply entrenched in people’s popular perceptions of codes and communi-
cation and can stand on their own, peripheral modes come as inevitable
‘by-products’, as inherent elements of a core mode’s specific medial realisa-
tion. This is not to say that core modes are more important than peripheral
modes or more powerful in terms of their internal grammar and resulting
communicative potential. Rather than being a major/minor distinction the
differentiation indicates that some modes only come into being along with
others and depend on them to some extent. Non-verbal means are ambivalent
here as they can also function independently of language. On the one hand,
then, the non-verbal is a concomitant aspect of language (cf. Müller 1998), on
the other hand, it is part of a communicator’s image and thus also relevant to
pictorial analysis.
2. In another sense core modes are also abstract modes that need to be instan-
tiated in a specific medial variant. The grammar of language must be realised
either in speech or in writing – it is only in these medial variants that peri-
pheral modes pertain to them.
3. Sub-modes8 constitute a mode in that they provide the building blocks of a
mode’s grammar. It would be wrong, I believe, to view any of the sub-modes in
isolation or as dominant in the make up of a mode. It is rather all sub-modes
In between modes
. Mode distinctions
When ‘reading’ a multimodal text, average recipients will normally become only
dimly aware of the fact that they are processing information encoded in different
modes. The manifold inter-modal connections that need to be made in order to
understand a complex message distributed across various semiotics will go largely
unnoticed. All modes, then, have become a single unified gestalt in perception,
and it is our neurological and cognitive disposition for multimodal information
processing that is responsible for this kind of ease in our handling of multimodal
artefacts. A theory of multimodal communication, however, has to meticulously
dissect an apparently homogeneous and holistic impression. It has to sensitise us
for the essential differences of the modes involved and make us aware of the tex-
tual work we invest in building inter-modal relations so crucial to understanding.
Multimodal theory also needs to ask in how far there are systematic similarities
and ties between the modes involved.
In what follows, I want to demonstrate that it is at least on three different levels
that modes can be distinguished from one another: semiotic properties, cognitive
orientation, and semantic potential. While I will here confine myself to a juxtaposi-
tion of the two core modes language and image,10 the three theoretical perspectives
can be applied to all other modes and sub-modes.
The semiotic properties of a mode refer to its internal structure and to the general
ways in which users can make meaning with a mode’s signs. Language has what
linguists call double articulation, i.e. discrete signs on two levels of organisation,
phonemes and morphemes, which combine to form words and utterances. This
design feature explains the boundless flexibility and resourcefulness of language.
Images, in contrast, have no distinct signing units. There are no rules that would
explain how pixels yield higher-level units when combined. What comes closest
In between modes
These and other semiotic properties result in different cognitive operations de-
manded or afforded by language and images. Most importantly, language is a
linear mode that calls for the successive integration of signs into phrases, whereas
images are rather based on simultaneous and holistic gestalt-perception. Conse-
quently, images can be regarded as a quick mode relative to language as they do
not necessitate parsing. We know from psychological experiments that images are
far more likely to be attention-getters in perception than language and can also be
memorised much more easily and effectively. Both have to do with their analogue
code characteristics – no recoding needs to take place and pictures can therefore be
regarded as a code close to reality or – as some semioticians have argued – a ‘lan-
guage’ without a code. The speed of pictorial perception is usually put down to the
simultaneity of gestalt formation, whereas the communicative impact of images is
seen in the fact that they directly tap into the emotions and provide immediate
sensory input.
Semiotic and cognitive characteristics determine what users of a mode can do with
it in terms of specific meaning-making resources. Although debatable, it has gen-
erally been accepted that the semantics of language is less vague and polysemous
than that of the image. While language provides scope for double meaning, it has
conventional semantics attached to words and utterances. Images, on the contrary,
Hartmut Stöckl
are seen to be inherently vague and ambiguous and can only be made to mean and
communicate specific contents by a combination with other modes or the embed-
ding into narrowly defined communicative situations. Most importantly, images
lack a definite speech act repertoire, which is why their illocutions remain cloudy
unless they are complemented by language. Language, on the other hand, counts as
less rich in information than images, which carry a welter of sensory information
and are particularly intense in terms of connotation. Conversely, language is at a
great advantage as for its potential to communicate all sensory modalities, whereas
pictures clearly are confined to visual information.12 Similarly, the self-referential
capabilities of images are weak, whereas they are basically unlimited with lan-
guage. Finally, language can be used to make just about any utterance imaginable.
This huge semantic flexibility, which results from the linguistic design principle of
double articulation and an elaborate set of rules, is contrasted by some obvious
semantic restrictions of images. Some meaning relations like causality cannot be
expressed, negation and affirmation are impossible and the utterances construable
from images are usually additive. To sum up, language has its strength in the depic-
tion of events and states-of-affairs in time, whereas images are particularly suited
to the representation of objects in space and their physical characteristics (cf. Kress
1998: 68f.).
to the fact that the graphic articulation used to represent speech sounds has its
origins and immediate precursors in the pictorial. The strongest argument for the
innate tie between language and image, after all, is their co-presence in almost all
forms of communication, a symbiotic mode integration (mixing) which is guided
by the principle of reciprocally balancing out limitations and weaknesses of the
modes combined.
There are two basic ways in which the linguistic and the pictorial mode can come
together in a text. Firstly, a verbal text can itself acquire image qualities by means
of typography and layout. In this case a peripheral mode (typography) of a medial
variant (writing/language) is employed for a partial transfer from one core mode
(language) to another (image). Here, the carrier of the linguistic mode emulates
the pictorial. Secondly, and this is the more common option, a verbal text is com-
bined with an image. The two core modes are semantically and formally integrated
so that each mode strategically employs its range of sub-modes thus unfolding the
specific semiotic potential of each mode and contributing to an overall commu-
nicative gestalt. A specific type of this language-image-combination would result
if the verbal text contained language that was itself pictorial or figurative and es-
tablished a semantic or formal link with the accompanying visual image. Let me
illustrate both types of language-image-link by discussing two sample texts drawn
from the advertising genre. My aim in this will be to show on which levels such
interfaces between two modes can be analysed.
In an advertisement of the RSPCA (the Royal Society for the Prevention of Cruelty
to Animals) for free-range eggs (cf. Figure 3) the verbal text is typographically de-
signed to yield the visual form and appearance of a supermarket receipt. Although,
of course, the language contained in the text is not what we would expect to read
on a receipt, conjuring up the image of a receipt is possible, because this specific
text type, like many others, comes with a built-in range of graphic features that
can be imitated. Wehde (2000) calls such configurations of typographical and lay-
out properties which form a set of visual expectations tied to a particular text type
(format) a “typographisches Dispositiv”, which could be rendered in English as
the ‘typographic repertoire’ of a text type. Which typographic/layout sub-modes,
then, have been employed to give the impression of the typographic repertoire of
a receipt?
Hartmut Stöckl
Figure 3. “A few pence extra”, RSPCA (The Times Magazine 15.03.03, p. 10)
Battery conditions are appalling. But apparently, a few pence extra for free-range
is intolerable. Over 20 million hens live in battery cages in the UK. If you can call
it living. They’re so crammed in that they can never even open their wings. And
their bones can become brittle. But hey, their eggs can be a few pence cheaper than
free-range. So does that make it justifiable? Some people obviously think it does.
Because, while 86% of the British public say that battery cages are cruel, only 32%
of the eggs sold in Britain are barn or free-range. At the RSPCA we think the UK
In between modes
should ban all battery cages as soon as possible. Margaret Beckett is considering
it. So help her decide. Stop buying battery eggs. Farm animal welfare. It’s in your
hands. www.rspca.org.uk/eggs Registered charity number 219099
The visual image of a receipt is mainly formed by narrow margins, which have
been marked by lines of three stars each. The text body is heavily subdivided into
small portions of variously aligned print. This has been achieved by paragraphing
and spacing as well as by lines formed of the same stars as at the margins. The font
clearly betrays its provenance as having been produced by the typical cash-desk
printer. This is signalled through the formation of the characters from individ-
ual dots, a system also used in digital displays. Contributing to this is the blue
colour of the print as well as its irregular quality and the blurred print blotches
in between, which by association also indicate the low but functional quality of
the paper and the printing technology. The enumerative and strongly portioned
character of the receipt also materialises through bold print, tab stops in the mid-
dle of lines, capitalisation and the use of numbers so typical of receipts. Besides
being heavily paragraphed, receipts also come as parts of a continuous text (paper
roll) from which they are torn when handed out to the customer. This continuous
character of the text is borne out by the recurring logo on either side and the cen-
trally aligned text body right underneath, which usually communicates the name
and address of the supermarket or some other standing detail. Continuity is also
expressed by the cut off logos at the top and bottom of the text.
When typographical repertoires are exported from one text type (receipt) to
another (advertisement), the resulting effect is not merely pictorial as in our ex-
ample, where the receipt is sort of reified as a textual object. More importantly,
exported or emulated typographical repertoires have a semantic impact. In the
RSPCA advertisement the receipt-like character of the text adds to the meaning of
the verbal text. It supports the central argument of the ad, which says that pur-
chasing behaviour can make all the difference in the battle for more free-range
farming. The receipt image of the text makes the pivotal point that it is in the
supermarket where farming policies are shaped via the price of the eggs and con-
sumer behaviour. The readiness to overcome one’s own meanness and spend a few
pence extra on eggs as the target of the advertisement is suitably transported in the
visual image of a receipt, as it directly taps into the knowledge script of shopping
and thus reduces the mediated effect of communication and makes it more direct.
Figure 4. “No small fry”, Toshiba (The Sunday Times Magazine 14.09.97, pp. 2f.)
see why more people are working wherever and whenever they like, call us on . . .
Hopefully you’ll agree, Toshiba really do have portable computing canned (9). . .
The first level to scrutinize is the semantic and pragmatic tie between language
and image. It is obvious here that both the image and a whole network of lin-
guistic expressions function metaphorically. The central metaphor could be spelt
out as laptop = fish can (visual image) or building laptops = catching
fish/canning fish (verbal text). Whereas the visual image provides the overrid-
ing metaphor which serves as a frame of orientation for the interpretation of the
verbal metaphors, the literal meanings of the metaphorical phrases (in italics) de-
tail and structure the emerging mental images (sea, fishing). The result of this
kind of visual-verbal metaphorical play is the mental mapping of source domain
features (sea, fishing) onto target domain features (computer manufacturing). So
here analogies are built between
(1) the sea and the market (oceans apart, create waves, take the plunge)
(2) the quality/quantity of the catch and the quality of the computers (no small
fry, good haul)
(3) fish processing (canning) and quality computer assembly (have portable com-
puting canned)
(4) fish/fishing and computer firms (rather lead the shoal than swim with it, no
catch)
(5) instruments of fishing (bait) and compatibility with networking standards
(take the bait from most networks).
On a second level, we could ask which cognitive operations are afforded by the
design of the language-image-link. Clearly, what is intended here is the oscillation
between literal and metaphoric meaning-making throughout the text, but also the
successive integration of verbal phrases into a whole. On the one hand, all fig-
urative expressions can be read literally (i.e. with recourse to sea and fishing), an
interpretation facilitated mainly by the visual image, which provides the concept of
a can with all its physical properties. On the other hand, of course, the phrases can
be decoded in their metaphorical meanings, which support the persuasive inten-
tion of the text. Both on the literal and metaphorical plane of meaning the phrases
combine to build a structured network. The network’s elements cohere because
on a literal level they build paradigmatic sense relations and all add to a common
mental image. On a metaphorical level the phrases are networked as they all con-
tribute to realising typical advertising speech acts and establishing the advertising
text pattern. So, no small fry and good haul describe and evaluate the advertised
product, created waves, rather lead the shoal than swim with it and have portable
Hartmut Stöckl
computing canned promote the firm’s image and express praise, whereas take the
plunge is an appeal to the consumer to try out the advertised products.
On a third level of analysis, the overall textual structure built from language
and image in our sample text is – as we have seen – one of metaphorical projection
and literalisation. Verbal and pictorial text are strongly interdependent in as much
as the visual image promotes the literal readings of the figurative phrases, and the
metaphorical language explains the context and the motivation for the pictorial
metaphor.
Finally, on a fourth level of analysis we need to enquire into the specificity of
the visual image. Images in concrete communicative events always come as types,
and there are a number of design features in our example that are for one thing
typical of advertising and, for another, facilitate the language-image-link described
so far. Most importantly, the image’s metaphoric nature is realised by a morphing
technique which allows for the carefully engineered blending of visual features. As
a result the reader gets a very realistic impression of an imaginary object, which
plays tricks on his perception. Shape, size and colour come as unifying character-
istics of both objects (can, laptop) morphed into a single gestalt, while can opener
and keyboard as well as the plug-in connections represent distinguishing traits of
the objects blended. In advertising images single objects are often shown against a
neutral background in order to bring out their salient characteristics, as has been
done here. Also, there is something like a functional perspective in advertising im-
ages, which makes the perception of the objects depicted as easy and effective as
possible. The can/laptop is shown from slightly on high and from a relatively short
distance so we can easily take in all its important attributes. Curiosity and tension
is created by the half-open state of the can with the opened part pointing away
from the viewer.
The two sample texts were to demonstrate that mode-integration may be com-
plex, because there is both mode mixing, i.e. the calculated and complementary
co-deployment of language and image, as well as mode overlapping, i.e. the col-
lapsing of modes into one another. Overlaps of modes are seen in instances where
language can be doubly pictorial. Firstly, verbal text can assume pictorial quality
via typography and layout and secondly, language can be based on and evoke men-
tal imagery. Mode-overlapping, however, also reflects in the easily neglected fact
that images are to a great deal rooted in language or rather in knowledge frames
and scripts which are heavily codified and structured in the form of our linguistic
repertoires. So, what can be expressed and communicated in images (in produc-
In between modes
tion and reception) is not only dependent on our visual experience of the world or
the material and technical properties of image-media, but is also crucially shaped
by our stock of words, phrases and stereotypical language utterances.
When separate modes indeed so closely intertwine in multimodal texts, is it
not likely, then, that “common semiotic principles operate in and across different
modes” (Kress & van Leeuwen 2001: 2)? This is a central question of multimodal
theory and analysis. As we have seen in the case of language and image (cf. Section
3), modes differ noticeably from one another in terms of internal sign struc-
ture, semantic potential and cognitive operations afforded. This is why individual
modes need to first of all be regarded as possessing their own ‘grammars’, which are
distinct from one another as they follow different organising principles and make
different functionalities available. This does not, however, rule out the possibility
that some overriding principles govern and guide all modes simultaneously. Such
trans-modal operating principles, then, would have to be sufficiently general and
basic to be able to span the great variability of modes. My view on common semi-
otic principles across modes, therefore, is a balanced and dynamic one. I endorse
the formal and functional differentiation between modes while at the same time
acknowledging the trans-modal operation of very global semiotic rules governing
the organisation of individual modes and their reciprocal integration. The general
cross-modal principles are only instantiated in texts and communication, and I
suggest to look at those principles as a means to regulate and guarantee a kind of
semiotic equilibrium in a concrete multimodal text. In what follows I will roughly
sketch out some of the basic semiotic principles operating across modes.
1. The three Hallidayan meta-functions (Halliday 1994) would be the first prin-
ciple that can easily apply to all modes imaginable and to the multimodal
text as a whole. Any mode is – to varying degrees – able to depict states-of-
affairs (ideational), design some social interaction between the communica-
tors (inter-personal) and contribute to organising and structuring the text
(textual). In any one multimodal text these three functions need to be ful-
filled and, more importantly, distributed across the modes present. Here, the
aim must be an inter-modal balance between the meta-functions, that is their
distribution across modes will be guided by how, in a given communicative
event, the functions can be realised most efficiently. Consequently, modes will
be positioned towards one another according to which part they play in ful-
filling the meta-functions. This is a first key to the structuring of multimodal
discourse.
2. Segmentation, that is the decomposability of larger sign structures or gestalts
of perception into their constituent elements, would seem to be a second semi-
otic principle operating in and across modes. All modes need to signal their
internal structure as keys to the retrieval of portions or layers of meaning.
Hartmut Stöckl
. Conclusion
Notes
. The confusion between multimediality and multimodality can be seen as an effect of the
hype surrounding the discipline, which has mainly been generated by a fascination with new
technologies and their apparently boundless opportunities.
Hartmut Stöckl
. Iedema (2003: 40) calls this kind of argument the “always already (Überhaupt) multi-semiotic
nature of meaning-making”.
. For a short historical sketch of multimodality’s comparatively brief history (see Iedema
2003: 30ff. and Stöckl 2004: 11–20).
. Such ‘grammars’ of individual modes are, for instance, outlined in Kress and van Leeuwen
(1996), Doelker (1997) or Stöckl (2004) (image), in van Leeuwen (1999) (sound and music),
in O’Toole (1994) (displayed art, e.g. sculpture), in Stötzner (2003), Willberg and Forssmann
(1999), Walker (2001) or Wehde (2000) (typography), and in Kühn (2002) (non-verbal com-
munication). While some of them are based on an explicitly systemic functional approach,
others apply different methodologies. Common to most theories of single signing modes is the
(metaphorical) transfer of some kind of linguistic or semiotic pattern. This goes to show that
language is seen to be central in signing practice.
. Kress (1998: 55ff.), for instance, suggests that it is first and foremost a change in social con-
ditions, and not technology as such, which drives the shift from language to image in the
communicative landscape. He pinpoints information overload as a cause for a greater reliance
on the image.
. Kress and van Leeuwen (2001) is the first attempt to outline a general theory of multimodal
communication. The objective to be all-embracing has, however, been pursued, here, to the
detriment of specificity.
. Stöckl (2002) shows how the static printed image is also capable of realising narrative.
. The terminology used for sub-modes and features here is heterogeneous. I have largely fol-
lowed Kress and van Leeuwen (1996) and van Leeuwen (1999), but have also used rough labels
borrowed from generally accepted vocabulary. As for typography, I have made use of Stötzner
(2003) and Willberg and Forssmann (1999). For inspiration, I have also turned towards Neuen-
schwander (2001) and Bellantoni and Woolman (1999). As for the para-verbal, I have consulted
Neuber (2002). The linguistic sub-modes follow accepted notions of levels of text-analysis. Again
it needs to be emphasised that my aim was a very general but systematic scheme of things, not
total completeness or precision. An application of multimodal analysis to the TV-commercial
which roughly follows the lines sketched out here can be found in Stöckl (2003).
. Kress and van Leeuwen (2002: 350f.) can be understood to generally endorse the idea that
colour is a mode in its own right, although they also advise caution. Their main argument for
regarding colour as a mode is that “it can combine freely with many other modes” (ibid.: 351).
Although that seems true, colour, when combining with other modes, is part and parcel of those
modes, which is why I would like to maintain that colour comes as a sub-mode.
. The following differences between language and image are presented in more detail and with
relevant bibliographical sources in Stöckl (2004: 245ff.). Almost all of the arguments represent a
broad consensus in semiotics and cognitive psychology.
. A concept of images based on their iconic nature can also be maintained when applied to
abstract, i.e. non-depicting images. Colours and shapes in spatial combinations that do not refer
to concrete objects real or imagined will mainly make ‘meaning’ by associations we have with
these colours and shapes acquired in sensory experience.
. By synaesthetic connections images can also communicate information other than the
purely visual.
In between modes
. Stöckl (2004) is a detailed study of the language-image-link with respect to advertising and
journalism.
References
Bellantoni, Jeff & Woolman, Matt (1999). Type in Motion. Innovative digitale Gestaltung. Mainz:
Hermann Schmidt.
Doelker, Christian (1997). Ein Bild ist mehr als ein Bild. Visuelle Kompetenz in der Multimedia-
Gesellschaft. Stuttgart: Klett-Cotta.
Halliday, M. A. K. (1994). An Introduction to Functional Grammar. London: Arnold.
Iedema, Rick (2003). “Multimodality, resemiotization: Extending the analysis of discourse as
multi-semiotic practice.” Visual Communication, 2 (1), 29–57.
Jakobson, Roman (1971). Fundamentals of Language. The Hague: Mouton.
Kress, Gunther (1998). “Visual and verbal modes of representation in electronically mediated
communication: The potentials of new forms of text.” In I. Snyder (Ed.), Page to Screen.
Taking Literacy into the Electronic Era (pp. 53–79). London: Routledge.
Kress, Gunther & van Leeuwen, Theo (1996). Reading Images. The Grammar of Visual Design.
London: Routledge.
Kress, Gunther & van Leeuwen, Theo (2001). Multimodal Discourse. The Modes and Media of
Contemporary Communication. London: Arnold.
Kress, Gunther & van Leeuwen, Theo (2002). “Colour as a semiotic mode: notes for a grammar
of colour.” Visual Communication, 1 (3), 343–368.
Kühn, Christine (2002). Körper – Sprache: Elemente einer sprachwissenschaftlichen Explikation
non-verbaler Kommunikation. Frankfurt am Main: Peter Lang.
Müller, Cornelia (1998). “Beredte Hände. Theorie und Sprachvergleich redebegleitender
Gesten.” In C. Schmauser & T. Noll (Eds.), Körperbewegungen und ihre Bedeutungen
(pp. 21–44). Berlin: Arno Spitz.
Neuber, Baldur (2002). Prosodische Formen in Funktion. Leistungen der Suprasegmentalia für das
Verstehen, Behalten und die Bedeutungs(re)konstruktion. Frankfurt am Main: Peter Lang.
Neuenschwander, Brody (2001). Letterwork. Creative Letterforms in Graphic Design. London and
New York: Phaidon.
O’Toole, Michael (1994). The Language of Displayed Art. London: Leicester University Press.
Stafford, Barbara Maria (2001). Visual Analogy. Consciousness as the Art of Connecting.
Cambridge, MA: MIT Press.
Stöckl, Hartmut (2002). “From space to time into narration – Cognitive and semiotic
perspectives on the narrative potential of visually structured text.” In H. Drescher, W.
Thiele, & C. Todenhagen (Eds.), Investigations into Narrative Structures (pp. 73–98).
Frankfurt am Main: Peter Lang.
Stöckl, Hartmut (2003). “‘Imagine’: Stilanalyse multimodal – am Beispiel des TV-Werbespots.”
In I. Barz, G. Lerchner, & M. Schröder (Eds.), Sprachstil – Zugänge und Anwendungen. Ulla
Fix zum 60. Geburtstag (pp. 305–323). Heidelberg: Universitätsverlag Winter.
Stöckl, Hartmut (2004). Die Sprache im Bild – das Bild in der Sprache. Zur Verknüpfung von
Sprache und Bild im massenmedialen Text: Konzepte, Theorien, Analysemethoden. Berlin: de
Gruyter.
Hartmut Stöckl
Stötzner, Andreas (2003). “Signography as a subject in its own right.” Visual Communication, 2
(3), 285–302.
van Leeuwen, Theo (1999). Speech, Music, Sound. London: Macmillan.
Walker, Sue (2001). Typography and Language in Everyday Life. Prescriptions and Practices.
London: Longman.
Wehde, Susanne (2000). Typographische Kultur: eine zeichentheoretische und kulturgeschichtliche
Studie zur Typographie und ihrer Entwicklung. Tübingen: Niemeyer.
Willberg, Hans-Peter & Forssmann, Friedrich (1999). Erste Hilfe Typographie. Ratgeber für
Gestaltung mit Schrift. Mainz: Hermann Schmidt.
Chapter 2
Peter Muntigl
University of Salzburg, Austria
. Introduction
The observation that people draw on various meaning-making systems when con-
struing experience or enacting social reality is certainly not new. Research that
may be loosely subsumed under the rubric of ‘context analysis’ has shown how
various ‘semiotic systems’ such as speech, gesture, body position and eye gaze are si-
multaneously deployed in interaction (for an overview, see Kendon 1990: 15–49).
The interactive coordination of various modes has already been documented in
terms of the relationship between gesture and speech (Kendon 1982, 1997; McNeill
1992; Streeck 1993, 1994), speech and gaze (Goodwin 1979; Kendon 1990; Streeck
1993), speech and body position (Kendon 1985) and speech, gaze, gesture and
Peter Muntigl
body position (Goodwin & Goodwin 1992). A more recent focus has also shown
how technologies and/or spatial arrangements in workplace or organisational set-
tings play a part in shaping interaction (Goodwin 1994; Goodwin & Goodwin
1996; Hutchins 1996; LeBaron & Streeck 1997).
The meaning-making role of semiotic systems other than speech has also been
addressed in systemic functional linguistic (hereafter SFL) oriented research. Kress
and van Leeuwen (1996), for instance, have argued that images have a grammar,
and may be analysed along the same lines as Halliday’s (1978) metafunctional
approach to language. Music has also been given a semiotic interpretation (van
Leeuwen 1999). Furthermore, Kress and van Leeuwen (2001) claim that smell and
colour may be treated as semiotic systems in their own right, and may be added to
the list of systems used in meaning-making.
In examinations of how meaning is created, SFL research has been largely se-
lective, favouring speech or writing as the primary semiotic systems through which
experience is construed as meaning.1 In this chapter, I discuss how other semiotic
systems – with a special focus on gesture – may be included alongside speech in the
construction of meaning. The first question I address involves how semiotic sys-
tems can or should be modelled; that is, how do semiotic systems such as image or
gesture relate to ‘language’ (i.e. speech or writing), and can these semiotic systems
be modelled using what Matthiessen (1993: 230) refers to as “general principles of
intra-stratal organisation”or “fractal dimensions”. My second question concerns
how the deployment of semiotic systems may be organised. Put differently, can
a situational variable such as mode be made responsible for coordinating which
semiotic systems may be activated in various contexts of situation?
As semiotic systems speech and writing are given very high status in SFL. First,
Halliday and Matthiessen (1999) argue that the semiotic systems that best fall un-
der the category of ‘language’ are speech and writing; and second, language (i.e.
speech and writing) is the primary semiotic system through which people con-
strue experience as meaning. They also suggest that other semiotic systems such as
music, dance, dress, cooking, the organisation of space, charts, maps and diagrams
are modelled on language:
These systems enter into relations with language in two ways. On the one hand,
they are metonymic to language: they are complementary, non-linguistic re-
sources whereby higher-level systems may be realised (e.g. ideological formations
realised through forms of art; theoretical constructs realised through figures and
diagrams). On the other hand, they relate metaphorically to language: they are
Modelling multiple semiotic systems
discourse semantics
content ‘plane’
lexicogrammar
phonology/
graphology expression ‘plane’
Figure 1. Tri-stratal organization of language (from Halliday & Martin 1993: 26)
posture and head movement. Put another way, speech is marked by meaningful
acoustic-temporal patterning, and sign by meaningful visual-spatial (and tem-
poral) patterning. Sign should, however, not be confused with gesture, and it
is worthwhile pointing out the differences between these two semiotic systems.
McNeill (1992, 2001), for instance, drawing from Kendon’s (1988) work, argues
that gesture and sign language may be placed along what he now refers to as
‘Kendon’s continuum’, shown in Figure 2. Gestures and sign language are located at
opposite ends of the continuum, with pantomime and emblems located at points
further towards the centre. According to McNeil gestures obligatorily accompany
speech, have no linguistic properties, are not conventionalised and are global and
synthetic. In contrast, signing occurs in the absence of speech, exhibits linguistic
properties, is fully conventionalised and is segmented and analytic.
Modelling multiple semiotic systems
Returning to Halliday and Matthiessen’s (1999) claim that various semiotic sys-
tems may be described as if they had their own grammar, we are still left won-
dering whether, genuine or not, they actually have a grammar. And, if they do,
what the grammar looks like. Recall from McNeill’s standpoint, that gestures do
not exhibit linguistic properties. This does not mean that gestures are not mean-
ingful. Quite to the contrary, McNeill argues quite pointedly that gestures are
non-redundant and that they ‘add’ meaning to the speech segments with which
they co-occur. By ‘non-linguistic’, McNeill (2001: 3) means that a gesture “is non-
morphemic, not realised through a system of phonological form constraints and
has no potential for syntactic combination with other gestures”. Reformulating
McNeill’s arguments in SFL terms, gestures are non-compositional (no part-whole
constituency), exhibit an arbitrary relationship between content and expression
and cannot be syntagmatically organised. Put yet another way, gestures do not
have a grammar.
Peter Muntigl
03 F: Somebody explained to me yesterday how to get the eight ball in on the break,
but . . . it hasn’t worked ye@e@et
((Eddie moves to end table, chalking cue, places chalk on table edge))
04 (2.0) ((Eddie places stick over table parallel to side))
Gi. . .xxxxxxxxxxxxxxxxxxxxxxxxxx ((Eddie gazes at Franklin))
05 E: Use a top English on the side?
!**************!********!
Nonetheless, there is a good argument to be made that the parts also contribute
to the meaning of the whole. If the gesture is examined for its experiential mean-
ings, then the C-shape may be construed as the Participant, and the movement or
twisting motion as the Process. Furthermore, these experiential meanings may be
verbalised ergatively as ‘the pool ball + spins + backwards’. In this way, the ‘whole’
gesture may be interpreted as a middle clause (Halliday 1994: 169) in which the ges-
ture form is the Medium and the movement is the ergative Process. Looking at this
gesture compositionally, the form is one part (i.e. the Medium), the movement is
another (i.e. the Process) and the direction of movement yet another (i.e. Manner).
A C-shape gesture will not, of course, always iconically realise a pool ball or
a type of round object. We can all imagine situations in which the same gesture
has been used in a spatial sense to construe ‘thickness’ or ‘width’, or in a more
metaphorical sense in which a speaker is considering two possibilities. In the latter
sense, the C-shape might abstractly refer to the two possibilities in question, and
the twisting motion might construe the Process of ‘contemplating’ or ‘considering’.
In this sense, McNeill is surely right. We have to understand the figure that the
gesture realises, before we know what meanings the parts realise. However, in the
gesture in (1) I would argue that the parts contribute to our understanding of the
meaning of the whole gesture. If, for example, E would not have applied movement
to the gesture, it might have been difficult to interpret what the hand is actually
signifying. Notice that E does not say, “pool ball”. We have to infer this from his
gesture. Through the twisting a link is made between “top English”and a spin.
Therefore, it is only through the component parts of form and movement that we
are able to ‘see’ a spinning pool ball.
By looking at gestures in terms of SFL categories such as Participant and
Process, it becomes possible to read gestures compositionally. Furthermore, it is
doubtful that specific gesture forms such as a C-shape can mean just about any-
thing. I think there is a constraint, however minimal, between gestural form and
meaning. To conclude, I would suggest that McNeill’s proposed properties need
to be relaxed. Gesture is a semiotic system that includes a content and expression
stratum. Whether or not gestures have a grammar is still open to debate.
The above analysis of the C-shaped gesture has shown how ideational meanings
such as Participant, Process and Manner may be mapped onto gestures. The next
step would be to see if gestures are also interpersonally and textually organised.
The textual metafunction is not difficult to demonstrate. Since gestures occur in a
3D gesture space consisting of six quadrants (see McNeill 1992), gestures may be
analysed for where they occur in this gesture space – left, right, top, bottom, front
Modelling multiple semiotic systems
or back. Some of the textual systems that may be activated are theme, informa-
tion, and reference (Halliday 1985) or even ideal-real (Kress & van Leeuwen
1996). Gestures may also realise interpersonal meanings. For example, McNeill
(1992) argues that a certain type of gesture termed beat functions to evaluate cer-
tain parts of a speaker’s message. Other interpersonal functions may include ques-
tioning the import of what someone has said (Kendon 2001), regulating speakers’
patterns of attention (Goodwin & Goodwin 1986), assessing another’s utterance
(Goodwin & Goodwin 1992), and projecting the next part of the text (Streeck &
Hartge 1992). One important question that arises in gesture research is whether
gestures may construe both ideational and interpersonal meanings. If, as in beat
gestures, the hand is not functioning as a Participant, then it is questionable as to
whether the gesture will, in addition to evaluating the message, realise ideational
meanings.
The semiotic system that has probably received the most attention in terms
of metafunctional modelling is image. In their book Reading Images Kress and
van Leeuwen (1996) demonstrate how various ideational, interpersonal and tex-
tual meanings are realised in images. Ideational meanings include Participants
(e.g. Actor, Goal, Reactor, Phenomenon, Senser, Sayer), Processes (e.g. Mental,
Verbal, Action, Reactional) and Circumstances (e.g. Setting, Means, Accompa-
niment). Interpersonal meanings include contact, social distance, attitude and
modality. Finally, textual meanings include the spatial arrangement of informa-
tion (i.e. Given/New and Ideal/Real). Furthermore, Kress and van Leeuwen argue
that images have a grammar. If grammar is associated with metafunction, then
they are surely right. In Halliday’s (1994) model clause types such as material
and verbal, Participants such as Actor and Sayer and Circumstances such as Man-
ner and Accompaniment are grammatical, not semantic functions. What is much
more difficult to identify in images (and gesture) is a visual/spatial equivalent to
a grammatical class such as noun or verb. In order to advance the discussion of
how semiotic systems are stratally organised, more precision should be given to
what is meant by content (i.e. semantics and/or grammar) and expression. If a
semiotic system has a grammar, how can we differentiate between the system’s
grammar (i.e. how constituent noun-like or verb-like parts are syntagmatically or-
ganised) and how the grammar is being expressed? For gestures the components
of expression are kinesic and include arm, head and body position and movement,
Gesture-Unit and Gesture-Phrase (see McNeill 1992: 82ff.). When these body parts
take on functional shapes, it is possible that a grammar becomes realised. One cru-
cial difference between sign language and gesture is that the functional shapes in
the former semiotic system – and their possibilities for combination – are much
more highly developed than the latter.
Peter Muntigl
the written medium does not allow for this kind of sharing. Since written texts
may be viewed as finished products, the recipients of these texts may not normally
partake in text construction. Also important for process sharing is the degree to
which immediate feedback is possible. In the spoken medium, since texts are being
co-constructed by interlocutors, speakers are constantly receiving verbal and non-
verbal feedback from their addressees. Written texts do not allow for the same kind
of immediate feedback.
The third aspect of mode, channel, refers primarily to visual and aural.2 Im-
portant in multimodal analyses is the relationship between channel and different
semiotic systems. Martin (1992), for instance, argues that the availability of chan-
nels, visual and aural, will influence the kinds of semiotic systems that may be
brought into the meaning-making process. This is an important point to consider
in multimodal analyses. For instance, if both visual and aural channels are open,
speakers may engage in face-to-face conservation, and may provide each other
with both visual and aural feedback. If the aural channel is not available, say in
sign language, speakers must primarily rely on visual signs. Martin also points out
that only one channel may be available to one of the interactants. Radio, for in-
stance, transmits an aural signal but does not receive one. Therefore, radio relies
on one-way aural contact and no visual contact. Silent movies, on the other hand,
would rely on one-way visual contact, and no aural contact – unless music is played
during the movie. Martin’s work highlights the important connection between
channel and semiotic systems. In multimodal analyses, consideration should be
given to what channels are open and for whom, and which semiotic systems are
being deployed.
The concept of channel might also be expanded to include other kinds of
modalities such as smell, touch and taste. In workplace settings the accurate iden-
tification of smells, the taste of various foods, or the touch of different materials
may be of central importance and could at times play a pivotal role in the construal
of experience. Furthermore, the visual channel should include more than just the
ability to see one’s interlocutor. The availability of the visual channel allows speak-
ers to deploy other meaning-making systems such as dress, gesture, gaze, colour,
and body position. Channel, therefore, should not be too closely tied to speech, but
should be sensitive to a full range of semiotic systems.
What I am proposing is that the contextual variable mode plays a pivotal role
in organising which semiotic systems become activated in social contexts. In a
way this makes sense, if mode is responsible for such categories such as channel,
medium and language role. Since mode organises field and tenor choices (Martin
Peter Muntigl
Figure 3. Degree to which turn-taking systems are activated in speech (adapted from
Martin 1992: 515)
1992), these contextual variables will also influence the range of semiotic systems
deployed. In the following discussion, I will restrict the discussion of mode to
face-to-face interaction involving two-way aural and visual communication.
Martin (1992) argues that the degree to which the visual and aural channels
are accessible is influenced by the kind of ‘medium’ used in communicating mes-
sages. For instance, a radio operates using only a one-way aural channel. Television
and movies, on the other hand, relay messages one-way both aurally and visually.
Finally, two-way access of both aural and visual channels is most characteristic
of face-to-face conversation. In regard to other semiotic systems such as gesture,
eye gaze and body position, it is important to consider how differences in mode
may influence the deployment of these systems. Following Martin (1992), some of
these differences may be characterised in terms of how tenor or field choices relate
to channel. In face-to-face conversation involving two-way aural and visual chan-
nels, turn-taking may range along a continuum from free to restricted (see Martin
1992), as shown in Figure 3.
Modelling multiple semiotic systems
activity focussed
(specific participants)
(+locative; +manner; +qualifier)
thing focussed
(generalised participants)
Figure 4. Degree to which ideational gestures are activated in activity-focused vs. thing-
focused genres
So far it has been argued that, following Halliday and Matthiessen (1999), semiotic
systems other than speech or writing are stratally and metafunctionally modelled
in the image of language; that is, semiotic systems such as gesture may in different
ways ‘add’ textual, interpersonal, and ideational meanings to speech. One way of
modelling the functional interrelationships between semiotic systems is to follow
Halliday and Matthiessen’s (1999) and Martin’s (1995) suggestion in proposing
a set of general motifs that run throughout the grammar’s construal of experi-
ence. What they are claiming is that expansion and projection are transphenomenal
categories or fractal types, in that they pervade the whole semantic space of the
ideation base; that is, fractal types of expansion and projection operate not only
within sequence, but also within the semantic space of figure (i.e. clause) and el-
ement. These fractal types are also said to operate between genres and the stages
within a genre (Martin 1995). In this way fractal types operate at different levels
within a semiotic system (e.g. language), or from a multimodal perspective, be-
tween different semiotic systems, such as speech and gesture. Ideationally, fractal
types of projection include locution (”) and idea (’), whereas categories of expan-
sion include elaboration (=), extension (+) and enhancement (x). Categories of
projection and expansion are shown systemically in Figure 5.
A brief summary of each logico-semantic type is given in Figure 6, taken from
Halliday (1994: 220).
By putting together the idea of fractal types, such as projection and expan-
sion, operating between semiotic systems and the idea of social contexts activating
Modelling multiple semiotic systems
locution (")
projection
idea (')
sequence
elaboration (=)
enhancement (x)
Elaboration one clause expands another by elaborating on it (or some portion of it):
i.e. restating in other words, specifying in greater detail, commenting,
or exemplifying.
Extension one clause expands another by extending beyond it: adding some new
and, or element, giving and exception to it, offering an alternative.
Enhancement one clause expands another by embellishing around it: qualifying it with
so, yet, then some circumstantial feature of time, place, cause or condition.
Locution one clause is projected through another, which presents it as a locution,
says a construction of wording.
Idea one clause is projected through another, which presents it as an idea,
thinks a construction of meaning.
Genre
realized in
construes
’, ” ’, ”
To conclude this chapter, I show how the model represented in Figure 7 may op-
erate for speech and gesture. In order to do this, I return to the C-shaped gesture
of Example 1. Recall that, in this example, the speakers were engaged in a recount
genre. According to Martin (1992), recounts involve a Record stage in which a se-
ries of activities involving specific participants are expressed. For these reasons one
might expect gesturing to occur. Furthermore, the field involves the game of pool
and, more specifically, pool shots. Again, the field would predict the realisation of
activities, specific Participants (e.g. pool ball, table, rack, cue, etc.) and meanings
Modelling multiple semiotic systems
Gixxxxxxxx........xxxxx
18 E: it’ll hit the– hit the rack,
!********!*****_________!
away rack
((extends arm and jerks C-shapes on “hit”, opens hands, brings hands
together in triangular formation on “rack”and holds))
Example 2. Recount of how cue ball proceeds towards the rack with a top English spin
of Manner and Location. Indeed, as Example 1 shows, the gesture shape construes
Manner and a Participant. In reference to channel, language role and degree of
turn-taking, we might also predict the production of gestures. The speakers are
co-present, have access to both the aural and visual channels, are in a turn-free in-
teraction, and are using semiotic systems to constitute the social activity that they
are in. But what of the relationship between what E says and what he gestures? I
would argue that the relationship is of the expansion: enhancement (x) type, in
which the twisting C-shaped gesture depicts the result of using a top English. A
paraphrase may be worded as “use a top English on the side, (x) so that the cue ball
will spin.”In this way, the gesture is able to expand the meanings of speech through
manner (twisting C-shape = spinning ball) and cause (the gesture is the result of
using top English).
Another type of functional relationship is shown in Example 2 – taken from
Muntigl (1996: 285) – appearing somewhat later in the recounting of the pool shot.
While E recounts how the cue ball proceeds towards the rack with a top English
spin (it’ll hit the– hit the rack), he also gestures by making a triangular shape with
his hands.
In Example 2, the gesture expands the meanings of speech through elaboration
(=). More specifically, the triangular-shaped gesture elaborates on the meaning of
the rack by describing its shape. A paraphrase might be: “it’ll hit the– hit the rack,
(=)
which at this point has a triangular shape.”
In summary, multimodal analyses need to account for the role of mode in
meaning-making – how many channels, what is the role of each semiotic system,
etc. – and how combinations of semiotic systems functionally interrelate in differ-
ent contexts of situation. What this means is that we should broaden our analytic
lens beyond speech and writing to include other meaning-making systems.
Peter Muntigl
Notes
* I gratefully acknowledge the support of the Social Sciences and Humanities Research Council
of Canada, Postdoctoral Fellowship No. 756-2001-0224 in the preparation of this manuscript.
. For a notable exception, see Huemer (2001), who examined the functional interrelationships
between images and writing in job advertisements.
. Hasan’s (1985: 58) definition of channel “refers to the modality through which one comes in
contact with the message – whether the message travels on sound waves or on a piece of paper”.
Hasan argues for two types of channels through which messages may travel: phonic or graphic.
Unfortunately, the term graphic privileges writing as a semiotic system, and leaves out gesture,
image, body position and eye gaze as meaningful contributors to the meaning-making process.
For this reason, I adopt visual-aural rather than phonic-graphic.
References
Hutchins, Edwin & Klausen, Tove (1996). “Distributed cognition in an airline cockpit.” In Y.
Engeström & D. Middleton (Eds.), Cognition and Communication at Work (pp. 15–34).
Cambridge: Cambridge University Press.
Kendon, Adam (1982). “The study of gesture: Some remarks on its history.” Recherches
Sémiotique / Semiotic Inquiry, 2, 45–62.
Kendon, Adam (1985). “Behavioral foundations for the process to frame attunement in face-
to-face interaction.” In G. P. Ginsberg, M. Brenner, & M. von Cranach (Eds.), Discovery
Strategies in the Psychology of Action (pp. 229–253). London: Academic Press.
Kendon, Adam (1988). “How gestures can become like words.” In F. Poyatos (Ed.), Cross-cultural
Perspectives in Non-verbal Communication (pp. 131–141). Toronto: Hogrefe.
Kendon, Adam (1990). Conducting Interaction: Patterns of Behavior in Focused Encounters.
Cambridge: Cambridge University Press.
Kendon, Adam (1997). “Gesture.” Annual Review of Anthropology, 26, 109–128.
Kress, Gunther & van Leeuwen, Theo (1996). Reading Images. London: Routledge.
Kress, Gunther & van Leeuwen, Theo (2001). Multimodal Discourse. The Modes and Media of
Contemporary Communication. London: Edward Arnold.
Lebaron, Curtis & Streeck, Jürgen (1997). “Built space and the interactional framing of
experience during a murder interrogation.” Human Studies, 20, 1–25.
Martin, James R. (1992). English Text. Amsterdam: John Benjamins.
Martin James R. (1995). “Text and clause: Fractal resonance.” Text, 15 (1), 5–42.
Matthiessen, Christian M. I. M. (1993). “Register in the round: diversity in a unified theory of
register analysis.” In M. Ghadessy (Ed.), Register Analysis: Theory and Practice (pp. 221–
292). London: Pinter.
McNeill, David (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago:
University of Chicago Press.
McNeill, David (2001). “Introduction.” In David McNeill (Ed.), Language and Gesture (pp. 1–
10). Cambridge: Cambridge University Press.
McNeil, David, Cassell, Justine, & McCullough, Karl-Erik (1994). “Communicative effects of
speech-mismatched gestures.” Research on Language and Social Interaction, 27 (3), 223–238.
Muntigl, Peter (1996). “An analysis of the interactive organisation of shot discussions during
pool games.” In Proceedings of the 1996 Annual Conference of the Canadian Linguistics
Association, Calgary Working Papers in Linguistics (pp. 281–291).
Streeck, Jürgen (1993). “Gesture as communication I: Its coordination with gaze and speech.”
Communication Monographs, 60 (4), 275–299.
Streeck, Jürgen (1994). “Gesture as communication II: The audience as co-author.” Research on
Language and Social Interaction, 27 (3), 239–267.
Streeck, Jürgen & Hartge, Ulrike (1992). “Previews: Gestures at the transition place.” In P.
Auer & A. di Luzio (Eds.), The Contextualization of Language (pp. 135–157). Amsterdam:
Benjamins.
van Leeuwen, Theo (1999). Speech, Music, Sound. London: Macmillan.
Ventola, Eija (1987). The Structure of Social Interaction: A Systemic Approach to the Semiotics of
Service Encounters. London: Pinter.
Peter Muntigl
Appendix
Notation conventions
(8.0) length of pause
((breaks)) comments
(okay) transcriptionist doubt
Gi initiation of gaze
Gf end of gaze
...... gaze towards and away
xxxxxx gaze at addressee (Eddie’s gaze at Franklin)
! beginning or end of gesture
****** stroke of gesture
______ holding gesture
[ overlap
::: lengthening
@@@ laughter
Chapter 3
This chapter investigates the nature of a semiotic resource and systems from the
Systemic Functional Linguistics (SFL) perspective. A semiotic resource contains
both an expression and content plane as well as possesses system networks on
each of these planes. A system, on the other hand, is a configuration of meaning
potentials that is articulated through a semiotic resource. This is followed by an
argument of a visual image as a semiotic resource, comparing it with the
modality of language. In addition, it will also be proposed that Saussure’s (1967)
claim of arbitrariness between a signifier and a signified could further be
extended in the current understanding of the nature of language and visual
images, especially since both semiotic resources share a common historical
origin. The implications arising from the association between the two modalities
are also discussed briefly. Stemming from a need to understand the semiotic
resource of visual images, this chapter also proposes icons as the vocabulary of
visual images, analogous to the role of words in language. The conceptions
presented in this chapter are preliminary and by no means final nor definitive.
The chief aim of this chapter is to provoke a meaningful debate on some of the
pertinent questions in multimodal research.
Earlier work on meaning has centred on the notion of the sign. This focus is only
shifted when the work of Michael Halliday on Systemic Functional Linguistics the-
ory (SFL), redefines the boundaries of semiotics from “a study of signs” to “a
study of sign systems”. Halliday’s (1978) work marks a shift of emphasis from a
sign as an entity, to a system of signs operating together to make meaning. In the
SFL community signs are more commonly referred to as semiotic resources. They
include language, expressed in its written form through graphology or typogra-
phy, as well as the semiotic resources of visual images, mathematical notations
and other technical symbols.
In this age of the multimedia, there is an increasing awareness that mean-
ing is rarely made with language alone. As Baldry (2000) and Kress and van
Victor Lim Fei
Comparing the visual images, with the semiotic resource of language, the vi-
sual images can be observed to have an expression plane (display stratum) and
a content plane (the grammar and semantics strata). Halliday (1978: 39) pro-
poses that language is a “system of meaning potential”. Operating on the con-
tent and expression plane, each plane has a network of options where meaning
is made through the paradigmatic selections. Language is an abstraction until
it is expressed through either speech or writing. When the linguistic semiotic is
expressed through sound, the display stratum is phonology. When language is re-
alised through writing, the expression plane is graphology or in the instance of a
printed text, typography.
O’Toole (1994) and Kress and van Leeuwen (1996) argue that visual im-
ages are tools or semiotic resources, just as competent as language, in meaning-
making. The adoption of the stance that both the linguistic and pictorial modal-
ities should share an equal status is now widely recognised (for example, Baldry
2000; O’Halloran 2000; Thibault 2000; Kress & van Leeuwen 2001). Van Leeuwen
(2000), for instance, criticises the negative comparisons between language and vi-
sual images in his refutation of Barthes’ (1977) earlier proposition that words have
‘fixed meaning’ while images are ‘polysemous’. In addition to this, van Leeuwen
(2000) confronts some misconceptions regarding the pictorial semiotic such as
the assertion that visual images cannot represent negative polarity. Van Leeuwen
(2000: 179) also argues that visual semiotics should focus, “not only on the image
as representation, but also on the image as (inter)act”.
I add to these conceptions by proposing that visual images, like language, are
conceptual abstractions, each with its potential of meaning. As shown in Figure 1,
language is an abstract system of meaning potential, realised through its grammar,
and this is expressed on the display stratum, through typography in printed texts.
In the same manner, visual images are also abstractions that are realised through a
visual grammar network. On the display stratum they are expressed through visual
systems of graphics, such as form, perspective, layout and strokes.
Language Pictures
The separation between display and grammar for the pictorial semiotic may
be an uneasy one, due to the interwoven nature of the elements on both strata in
semiosis. Nonetheless, it is useful and necessary to distinguish between the two
planes, to recognise the systems’ potential as well as to understand the meaning-
making process. The example in Figure 2 demonstrates the theoretical distinction
between the display and grammar strata.
The expression plane of the iconic face in Figure 2 involves the system of colour
and form used to make meaning. This refers to the thin black line, the two black
circles as well as the larger white circle. Each of these elements independently as
well as together as a unified whole, has meaning potential. The grammar stratum,
as extensively theorised by O’Toole (1994) and Kress and van Leeuwen (1996),
relates one disparate element to another and explains how the whole functions co-
hesively to make meaning. Just as the grammar of language concerns itself with the
chains of words to form coherent sentences, the grammar of visual images is about
the piecing of one item with another to bring across a coherent message. The rela-
tion of the parts into wholes, for instance, how the various shapes form an iconic
face, operates on the grammar stratum. This grammar is culturally dependant and
governs the way a reader ‘reads’ and construes the visual message.
Following O’Toole (1994), a hierarchy of different ranks analogous to Halli-
day’s (1978) rank scale for language is proposed to look at the meaning made on
each of the rank units, from Member to Figure and Episode to Work. This adop-
tion of a rank scale, operating within the principle of constituency, where wholes
on each rank make up larger units in a hierarchy, facilitates a more systematic anal-
ysis of the meaning made in the different units on the visual grammar stratum. In
a sense, this delicate distinction between the display and grammar stratum can
be made, with the expression plane being largely concerned with the surface fea-
tures of the text and the content plane having to do with the interactions and
negotiations between the different elements in the text.
Problematising ‘semiotic resource’
Despite sharing the same historical origin, and many similarities, I argue that the
difference between the language and visual images lies in the degree of arbitrari-
ness in the relationship between the signifier, particularly the expression plane of
the semiotic resource, and the signified, the concept that is represented. Saussure
(1967) proposes that there is an arbitrary relationship between the signifier and
the signified in language. In contrast, Kress (1993: 173) tends towards the other
extreme arguing that “the relations of signifier to signified, in all human semi-
otic systems, is always motivated, and is never arbitrary”. He also suggests that
production factors such as the ‘interest’ of the producers, which is subjected to
temporality, society and culture, plays a crucial role in the organisation of the
sign. More recently, Kress and van Leeuwen (2001) also take into consideration
the strata of design, production and distribution in their discussion of multimodal
communication.
I propose that the claim of arbitrariness between the signifier and the signified
can be further extended. Leaning more towards the interpretation rather than the
articulation of the sign in this chapter, I place a greater emphasis on the meanings
that can be obtained through the reading of a sign, as opposed to the meanings the
sign was produced to convey. Sardar and van Loon (2000: 44) define reading (in the
field of media studies) as “the process of interaction when a text is analysed as well
as the final result of that process, the interpretation.” My stance is inclined towards
the post-structuralist position that meaning is found within the unregulated play
of reading the text, through the interpreting of various semiotic systems, as elab-
orated in the works of Roland Barthes (1977), Umberto Eco (1979) and Jacques
Derrida (1976). The post-modernistic position also paves out the understanding
that a text means independently of authorial intentions and could be analysed as
an artefact of culture.
Victor Lim Fei
Taking this position therefore, I hesitate to commit to Kress’ (1993: 173) claim
that all signs are “never arbitrary”. His argument that signs are ‘motivated’ from the
perspective of the producer’s interest can also be problematised. A question that
may be raised from Kress’ (1993) discussion is that if the sign is so “opaque” or so
inaccessible to the reader to the extent that the ‘motivated’ meanings are missed,
can the sign still be considered as a legitimate sign, now that it fails in its func-
tion to effectively communicate meaning. This is seen in his example of the child’s
drawing of a car (Kress 1993: 172). As Kress (1993: 178) comments rather para-
doxically, “without that accidental presence [of an adult with the child producer]
neither interest nor motivation would be easily recoverable”. In other words, the
intended ‘motivated’ meaning of the sign may never have been understood by a
reader. In such cases then, can the drawing still be considered as a sign when there
is no reader that can access the meaning of the sign? Peirce (1958: 228) regards a
sign as “something which stands to somebody for something in some respect or
capacity.”
For the purpose of this chapter, following Peirce (1958), I interpret the sign as
a tool that facilitates communication, thus necessitating both producer and reader
to share the same assumptions and thereby understanding the meanings made
through the shared semiotic modalities within a community. Hence, I prefer to
view the relations between the sign and the object it signifies on a scale of ar-
bitrariness with the conception of codification on one end and the notion of an
analogon on the other end of the scale.
The signifier of language could be expressed either through sounds in
phonemes, in the spoken form, or visually through typography or graphology,
expressed in the written form. Concerning the spoken form, that is speech, it is ir-
refutable that the relation between the signifier and the signified is arbitrary. This
is stated with the exception of onomatopoeia (sound words), where the signifier
mimics the vocalization of the signified, for instance, the ‘ringing’ of a telephone.
The claim of arbitrariness is also valid in writing systems of language, where the
signifier belongs to the syllabic and alphabetic type. The concept of a female child,
for example, could be realised by different signifiers in different languages. For
instance, in English the signifier is “Girl”, in French it is “Fille”, and in Italian it
is “Ragazza”. The lack of an obvious physical relationship between the signifier
‘girl’ and the signified concept of a ‘female child’ indicates that their connection
is capricious. However, in certain writing systems such as the logographic type,
where the signifiers are derived from icons of the objects represented, this claim of
arbitrariness may perhaps need to be modified. Certain types of writing systems
for language, although having each symbol representing a morpheme, may have
their signifiers originating from pictograms, evolving into a standardized writing
system over time. Some prominent examples are Chinese Characters and Egyptian
Problematising ‘semiotic resource’
Hieroglyphics. Tracing the history of such logographic writing systems can illus-
trate the standardisation and codification of pictograms into a writing system of
language over time. Some instances of this are seen in Figure 3.
With this, it is perhaps appropriate to propose differing degrees of arbitrari-
ness between the signifiers and the signifieds in language. As opposed to language,
visual images have a lower degree of arbitrariness, thus implying a higher degree
of iconicity. Visual images, however, are primarily iconic; that is, they resemble
the subjects they represent. Barthes (1977) proposes the term the perfect analo-
gon to describe the highest possible level of iconicity or mimesis with the object,
such as the image that a photograph produces. In visual images, where there is a
higher level of iconicity, the signified and the signifier are related through mimesis
or resemblance. The opposite end of the scale as opposed to the analogon is the ab-
straction. The analogon has a lower degree of arbitrariness whereas the abstraction
has a higher degree of arbitrariness. The typography/graphology of a language is
usually the abstraction. Scientific and mathematical notations are those that also
lean towards the higher ends of abstraction. Expressionist paintings such as works
Victor Lim Fei
of Picasso, for instance, will fall about midway between the scale of abstraction
and iconicity.
Since abstractions are characterised through a lower level of iconicity but a
higher degree of arbitrariness, the relationship between the signifier and the sig-
nified is reinforced through codification. In other words, codification links the
signifier with the signified sharing a high degree of arbitrariness between them.
Codification or ’grammaticalisation’ can only take place though effective sociali-
sation into the semiotic community. The term semiotic community, follows from
Labov’s (1972) ‘speech community’, and describes the people in the same culture,
sharing the same assumptions, and selecting choices within the common semiotic
resources to make meaning. For instance, in mathematical notations, there is a
higher degree of arbitrariness between the signifier and the signified and therefore,
stronger codification is required, thus necessitating a deeper initiation of members
into the particular community. Notations such as π and Σ can be baffling for the
non-members, and their dense meanings are only accessible to members of the
particular semiotic community.
Just as the building blocks of meaning in language are lexical items or words, I
propose that the building blocks of visual images are icons. In addition, the lexico-
grammar of a certain language is culturally specific. For instance, a speaker of
Chinese deploys a different lexicon than a speaker of English. Likewise, icons are
contextually and culturally specific as well. Different semiotic communities would
have different styles of representing the same objects and ideas.
However, the question of where to delineate the boundaries of an icon may
arise. For instance, with reference to Figure 2, when is a dot recognised as merely
a dot, and when is it functioning as an iconic eye? Icons are the pictorial repre-
sentations of objects identifiable in the culture. Thus, the recognition of an icon
as resembling an item is crucial in deciding what constitutes an icon. The arrange-
ment of lines and dots in a certain manner or ‘visual-grammatical’ placement, for
instance in Figure 2, may bring about the recognition of an iconic face. This iden-
tification of the icon is dependant on the relationship between its surrounding
co-text, in this example, the lines and the dots. The identification of the icon, in
turn, also allows us to recognise the iconicity of these co-texts. For example, after
recognising the iconic face, the significance of its co-text becomes apparent, for in-
stance, it is clear that the dots represent the eyes and the line stands for the mouth.
This is similar to how certain ambiguous words in language are disambiguated
when construed in relation to their surrounding co-text, i.e, the other words sur-
Problematising ‘semiotic resource’
rounding them. For instance, the word “bank” can either mean the sides of a river
or a financial institution. When used in “The robbers broke into the bank”, the pol-
ysemeousness of the word is disambiguated. It must be clarified that the internal
arrangements of lines and dots to constitute an icon are part of its visual grammar,
just as the relationship between part and whole is the grammar of the semiotic re-
source. In other words, the icon itself lies on the expression plane of the modality,
although the composition of an icon and the relationship between iconic elements
belong to the grammar stratum.
Due to constraints of time and space, it is not possible for a detailed investi-
gation into the different implications of the proposal of icons as the vocabulary
of visual images to be undertaken here. Nonetheless, my proposal will hope-
fully initiate further work along this direction, which can contribute to a better
understanding of the nature of visual images as semiotic resources.
. Conclusion
Note
* I would like to thank Kay O’Halloran for her insightful comments on an earlier draft of this
chapter.
Victor Lim Fei
References
Barthes, Ronald (1977). “The rhetoric of the image.” In S. Heath (Ed.), Image, Music, Text
(pp. 32–51). London: Fontana Press.
Baldry, Anthony P. (2000). Multimodality and Multimediality in the Distance Learning Age.
Campobasso: Palladino Editore.
Callaghan, Jean & McDonald, Edward (2002). “Expression, content and meaning in language
and music: An integrated semiotic analysis.” In P. Mckevitt, S. Ó’Nualláin, & C. Mulvihill
(Eds.), Language, Vision and Music (pp. 221–230). Amsterdam and Philadelphia: John
Benjamins.
Derrida, Jacques (1976). Of Grammatology. Baltimore: Johns Hopkins University Press.
Diringer, David (1986). The Alphabet: A Key to the History of Mankind. New York: Funk &
Wagnalls.
Eco, Umberto (1979). The Role of the Reader: Explorations in the Semiotics of Texts. Bloomington:
Indiana University Press.
Eggins, Suzanne (1994). An Introduction to Systemic Functional Linguistics. London and New
York: Pinter Publishers.
Eisner, Will (1990). Comics and Sequential Art. Princeton, WI: Kitchen Sink Press.
Halliday, M. A. K. (1978). Language as A Social Semiotic. London: Edward Arnold.
de Joia, Alex & Stenton, Adrian (1980). Terms in Systemic Linguistics: A Guide to Halliday.
London: Batsford Academic and Educational Ltd.
Keightley, David N. (1989). “The origins of writing in china: scripts and cultural contexts.”
In W. M. Senner (Ed.), The Origins of Writings (pp. 160–170). Lincoln, NE: University of
Nebraska Press.
Kress, Guenter (1993). “Against arbitrariness: the social production of the sign as a foundational
issue in critical discourse analysis.” Discourse and Society, 4 (2), 169–191.
Kress, Guenter & van Leeuwen, Theo (1996). Reading Images: The Grammar Of Visual Design.
London: Routledge.
Kress, Guenter & van Leeuwen, Theo (2001). Multimodal Discourse. The Modes and Media of
Contemporary Communication. London: Arnold.
Labov, William (1972). Sociolinguistics Patterns. Philadelphia: University of Pennsylvania Press.
Lemke, Jay L. (1998). “Multiplying meaning: visual and verbal semiotics in scientific text.” In J.
R. Martin & R. Veel (Eds.), Reading Science: Critical and Functional Perspectives on Discourse
and Science (pp. 87–113). London: Routledge.
Lim, Fei V. (2004). “Developing an integrative multisemiotic model.” In K. L. O’Hallaron (Ed.),
Multimodal Discourse Analysis: Systemic Functional Perspectives. London: Continuum.
O’Halloran, Kay L. (1999). “Interdependence, interaction and metaphor in multisemiotic texts.”
Social Semiotica, 9 (3), 317–354.
O’Halloran, Kay L. (2000). “Classroom discourse in mathematics: A multi semiotic analysis.”
Linguistics and Education, 10 (3), 359–388.
O’Halloran, Kay L. (2003). “Intersemiosis in mathematics and science: grammatical metaphor
and semiotic metaphor.” In A.-M. Simon-Vandenbergen, L. Ravelli, & M. Taverniers
(Eds.), Grammatical Metaphor: Views from Systemic Functional Linguistics (pp. 337–365).
Amsterdam and Philadelphia: Benjamins.
O’Halloran, Kay L. (2004). Multimodal Discourse Analysis: Systemic Functional Perspectives.
London: Continuum.
O’Toole, Michael (1994). The Language of Displayed Art. London: Leicester University Press.
Problematising ‘semiotic resource’
Peirce, Charles S. (1958). The Collected Papers of C. S. Peirce, Vol. 2. C. Hartshorne & P. Weiss
(Eds.). Cambridge, MA: Harvard University Press.
Royce, Terry (1998). “Synergy on the page: exploring intersemiotic complementarity in page-
based multimodal text.” JASFL Occasional Papers, 1 (1).
Saussure de, Ferdinand (1967). Cours de linguistique générale. R. Engler (Ed.). Wiesbaden: Otto
Harrassowitz.
Sardar, Ziauddin & van Loon, Borin (2002). Introducing Media Studies. Cambridge: Icon Books.
Senner, Wayne M. (1989). “Theories and myths on the origins of writing: a historical overview.”
In W. M. Senner (Ed.), The Origins of Writings (pp. 1–26). Lincoln, NE: University of
Nebraska Press.
Thibault, Paul (2000). “The multimodal transcription of a television advertisement: theory and
practice.” In A. Baldry (Ed.), Multimodality and Multimediality in The Distance Learning
Age (pp. 311–384). Campobasso: Palladino Editore.
van Leeuwen, Theo (2000). “Some notes on visual semiosis.” Semiotica, 129 (1/4), 179–195.
van Leeuwen, Theo (2002). “Multimodality and typography.” Paper presented at the 1st
International Symposium on Multimodal Discourse. Salzburg, Austria, 2002.
Chapter 4
. Introduction
Following the so-called ‘visual turn’ in many areas of communication, it has be-
come increasingly usual for investigators both to consider explicitly the presenta-
tion of information in forms such as photographs, diagrams, graphics, icons and
so on and to place such information in combination with linguistically presented
information. One of the corollaries of the broadening in the area of concern is that
we are forced to deal with systems which are manifestly meaning-making (e.g.
photographs, diagrams) but for which we lack the rich battery of investigative
tools that we now have for linguistic entities. Whereas the application of a lin-
guistic mode of analytic discourse is already showing significant benefits (cf. Kress
& van Leeuwen 2001), the strong coupling between data and theory-construction
that forms a tenet of much of modern linguistics is not yet a strong feature of
‘multimodal linguistics’.
In this chapter we address this concern. We give an example where informal,
interpretative claims have been made about aspects of multimodal discourse and
argue that the claims demand a much more rigorous empirical basis to be taken
John Bateman, Judy Delin and Renate Henschel
further. We then briefly introduce our own attempt to place multimodal study on
a firmer empirical basis.
Kress and van Leeuwen (1996) suggest that illustrated documents of a variety of
kinds can meaningfully by analysed in terms of several ‘signifying systems’ that
structure the information on the page. Of particular relevance here, is their dis-
cussion of information value in which they propose that the placement of elements
in particular ‘zones’ in the visual space endows them with particular meanings.
Each zone “accords specific values to the elements placed within it” (Kress & van
Leeuwen 1998: 188).
While suggestive, the notion of information value as used by Kress and van
Leeuwen is still in need of further clarification. Kress and van Leeuwen use it to
describe oppositions between elements placed on the left of a page or image, and
those placed on the right. Those on the left are considered to be ‘Given’; those on
the right ‘New’:
Given Presented as material the reader already knows; “common sense and self-
evident. . .presented as established” (Kress & van Leeuwen 1998: 189);
New Presented as material as yet unknown to the reader; “the crucial point of the
message. . .problematic, contestable, the information at issue” (Kress & van
Leeuwen 1998: 189).
The analysis is appealing in that it provides a ready vocabulary for reading more
out of page design than would otherwise be possible. Just as the analysis of En-
glish clauses into a Theme/Rheme structure, in which the element(s) placed at
the beginning of the clause have been shown to participate with high regularity in
larger text-structuring patterns (cf. Fries 1995), the Given/New patterning appears
to offer a similar analytic win for the page.
But to what extent is the claim supported? Indeed, how would it be sup-
ported? The use of Given/New here is very much more abstract than that generally
found in clause (or intonational unit) analyses; for Kress and van Leeuwen the
Given/New in the page revolves around problematised breaks in the social norms
expected. The analytic procedures for establishing to what extent this could be a
reliable property of layout rather than an occasionally plausible account are un-
clear. Nevertheless, following on the initial presentation of the analytic scheme
in van Leeuwen and Kress (1995), it has been presented again in Kress and van
Leeuwen (1996, 1998) and is now itself being adopted as unproblematic, or ‘Given’,
in some systemically-based research on multimodality (see, for example, Royce
1998; Martin 2002). Unfortunately, we have not so far found it to be supported by
Multimodality and empiricism
We need then to ask the questions concerning the semiotic values and their
realisation in layout that have been proposed by Kress and van Leeuwen more pre-
cisely. Is the entire scheme to be dismissed as a suggestive idea that did not work?
Or, does the scheme apply to certain kinds of documents and not to others? Or
to certain kinds of page layouts and not to others? All of these issues need to be
addressed and answered as multimodal document analysis moves away from the
suggestive and towards the analytic. Methods need to be adopted and documented
whereby suggestive frames of analysis can be expressed as predictive and falsifi-
able claims about document design and meaning-making. To do this, we need to
subject analyses to more detailed and systematic investigation, varying types of
documents, types of consumers, types of presentation medium, and purposes so
that we can get a finer grip on the meaning-making possibilities of the various
semiotics in play. And to do this, we need to turn to multimodal corpora specially
designed for supporting the investigation of multimodal meaning.
Linguistic corpora containing collections of several million words are fast becom-
ing the norm (the British National Corpus, for example, contains 100 million
words). With this mass of available ‘data’, it is increasingly important that the data
be organised so as to support, rather than hinder, scientific inquiry.
One simple illustration of the problem here involves the phenomenon of vari-
ant linguistic forms that do not play a role in an inquiry being pursued but which
make the posing of questions to a corpus more complex. If, for example, we are
seeking all occurrences of the verb ‘buy’ in order to see what complementation
patterns it occurs in, or which collocations it supports, we cannot just ask a text
Multimodality and empiricism
collection to print out all occurrences of the string of characters ‘b-u-y’. We can-
not even ask it to print out all occurrences of the word “buy” – because in both
cases we then do not get forms such as ‘bought’ and in the second case we miss
forms such as “buys”, “buying”, etc. While relatively straightforward to avoid, such
minor problems reoccur with every inquiry that one wishes to make of a corpus
and easily lead to error or incomplete results.
A further illustration, a little more complex, is how to deal with a linguistic
inquiry concerning uses of the modal ‘can’. We can ask to retrieve all instances of
the word ‘can’ from a corpus – but then how do we avoid all the (for this par-
ticular question irrelevant) instances of the noun ‘can’. Again, we can do this by
hand, ruling out the irrelevant cases, but this work reduces the effectiveness of
using a corpus and represents a considerable overhead. More sophisticated still,
if we wish to investigate the contexts in which some grammatical construction is
used rather than another, then we need to be able to search for such constructions
rather than particular words or sequences of words and this can be quite a difficult
undertaking.
In all these cases, modern corpora provide direct support for investigation
by annotating their contained data to include additional information that may be
employed when formulating questions. That is, not only will a corpus contain the
bare textual information, it will also contain information about the root form of
the words used (thus enabling a single question about all occurrences of the word
‘buy’ in any of its forms), their word classes (thus enabling a question exclusively
about modal ‘can’), and possibly some grammatical structures or other informa-
tion in addition. The provision of corpora viewed as collections of texts has largely
given way to annotated corpora, which contain additional information for the ask-
ing of more exact linguistic questions; standard introductions to corpus linguistics
describing this development in detail include Biber et al. (1998) and McEnery and
Wilson (2001).
In modern annotated corpora, it is usual to employ some kind of explicit
markup language in order to capture the extra information they contain. That is,
the basic textual information is ‘marked up’ with the additional information to
be represented drawing on standardised formats. This separates very clearly data
from information about that data – which makes the information as a whole con-
siderably easier to process and manipulate. The currently most accepted and well
developed standardised formats are based on the ‘Standard Generalized Markup
Language’: (SGML, see Bryan 1988) developed in the publishing industry and,
most recently, its particular instantiation for wide-scale electronic information
representation XML (the ‘eXtensible Markup Language’, XML). Standards for cor-
pus annotation adopting these frameworks are also now available (cf. XCES, see
CES 2000).
John Bateman, Judy Delin and Renate Henschel
Both SGML and XML recommend the definition of Document Type Descrip-
tions (DTDs), which specify precisely the structures that are possible in documents
and the kinds of entities that can fill slots in those structures. One of the reasons
for managing things in this way is that it allows documents to be automatically
checked for conformity with their intended structure. This process is called docu-
ment validation. It is by no means straightforward to guarantee that any sizeable
collection of information is structurally correct and consistent; this is the kind of
service that a DTD provides. Widely available DTD-parsers check documents for
conformity with their specified DTD, so that at least formal errors may be avoided.
The precise attributes that are allowed, and the kinds of values that they may take,
is specified formally in the Document Type Description, and this allows this infor-
mation to be formally validated for misspellings, missing brackets, wrong values of
attributes, etc. Validation is already a significant reason for providing information
in this structured form; we will see below, however, that many more advantages
accrue from the adoption of XML.
When attempting more sophisticated linguistic annotation, the most signifi-
cant problem is that of intersecting hierarchies. A good example of this from the
area of annotation for literary editions is given by Durusau and O’Donnell (sub-
mitted).2 One simple XCES-conformant markup of the linguistic content might
break a document down into a number of identified sentences; this would use a se-
quence of <S> and matching closing </S> tags. Another simple XML-conformant
Multimodality and empiricism
markup might want to indicate the division into pages that an edition employed –
here we would use a sequence of <page> . . . </page> tags. Now consider an an-
notation for a machine-readable version of the literary work that wants to capture
the page breaks and the linguistic divisions simultaneously. This is not straight-
forward simply because the linguistic division into sentences and the division into
pages have no necessary relationship to one another: there is no reason why the
structures imposed by the two kinds of division should embed one within the
other. Thus the simplest way of capturing this information, which might appear
to be something like the following:
<page> . . . <S> This is a sentence </page> <page> that goes over two pages.
</S> <S> Then there are more sentences on the page . . . </page>,
which is not ‘legal’ XML: the structures defined by the <S>-tags and the <page>-
tags do not ‘properly nest’. The first sentence tag is not ‘closed’ before its enclosing
page tag is closed. Allowing such non-nesting structures would vastly complicate
the machinery necessary for checking a document’s conformance with its DTD .
A solution for this problem that has now established itself is that of standoff
annotation (Thompson & McKelvie 1997). Standoff annotation recognises the in-
dependence of differing layers of annotation and separates these both from the
original data and from each other. Thus, instead of having a single marked-up
document where the annotations are buried within the data, the annotation in-
formation is separated off into independent annotation layers – hence the phrase
‘stand off ’. Each individual layer is a well-formed XML document. Contact is
made with the original data indirectly by referring to particular elements. This
solves the problem of intersecting hierarchies because within any single XML doc-
ument there is no intersecting hierarchy – there is only the single hierarchy of the
particular annotation layer that the document represents.
The additional technical complexity involved is that we need to be able to ac-
cess the individual elements of the data in order to bind them into a variety of
annotation structures. This can be achieved most simply within XML by giving
each element a unique identifying label and employing cross-references. This is
shown in a simplified example in Figure 1, where we have two annotation lay-
ers that show how a single text document is divided according to sentences and
according to pages. This accepts the fact that the linguistic division into sentences
and the print division into pages have no natural relationship with one another,
making it inappropriate to insist that such mark-up nest properly into well-formed
recursive structures simply to fulfil the SGML/XML formal restrictions. The situ-
ation illustrated takes a text where there is a page break immediately following the
text: “. . . Have you, miss? Well,”.
John Bateman, Judy Delin and Renate Henschel
structured data are not linguists, but standard commercial providers of infor-
mation that would previously have been maintained in databases, such as sales
catalogues of online companies, stock-lists, personnel data, and so on. Because of
this very practical and economic demand, methods for using such data are already
finding their way into the standardly available web-browsers – this virtually guar-
antees that it will soon be possible for XML-annotated corpora to be navigated
and manipulated using widely available and familiar tools rather than complex,
corpus-specific schemes and software.
In this section, we set out how we are approaching the design of multi-
modal corpora drawing on the state of the art for annotated linguistic cor-
pora described in the previous section. We have been pursuing these aims in
the context of a research project, the ‘Genre and Multimodality’ project GeM
(http://www.purl.org/net/gem).3 The basic aim of GeM is to investigate the appro-
priateness of a multimodal view of ‘genre’: that is, we are seeking to establish em-
pirically the extent to which there is a systematic and regular relationship between
different document genres and their potential realisational forms in combinations
of text, layout, graphics, pictures and diagrams. More detailed introductions to the
GeM model and its motivation can be found in Delin et al. (2002/3) and Delin and
Bateman (2002).
Our starting point for considering genre draws primarily on linguistic uses, such
as evident, for example, in Biber (1989) or Swales (1990). We also emphasise and
build on the social ‘embeddedness’ of genres: texts look different because they are
to function in different social contexts (cf. Halliday 1978; Martin 1992). Moreover,
as a final step, we then reconnect this notion to the practical contexts of produc-
tion and consumption of the discussed genres; that is, genres are also partially
defined by their ‘rituals of use’ and the application of various technologies in the
construction of their members (cf. Kress & van Leeuwen 2001).
The first attempt that we are aware of that provided a detailed model of
multimodal genre was that of Waller (1987). It took into consideration the vi-
tal contributions of language, document content, and visual appearance as well
as practical conditions of production and consumption. Our own work draws
upon and extends this framework by examining the interdependencies between
possible characterisations of genre on the one hand and of the various functional
John Bateman, Judy Delin and Renate Henschel
constraints on the other. The basic levels of analysis that the project has defined
are then as follows:
1. Content structure: the ‘raw’ data out of which documents are constructed;
2. Rhetorical structure: the rhetorical relationships between content elements;
how the content is ‘argued’;
3. Layout structure: the nature, appearance and position of communicative ele-
ments on the page;
4. Navigation structure: the ways in which the intended mode(s) of consumption
of the document is/are supported; and
5. Linguistic structure: the structure of the language used to realise the layout
elements.
We have already seen the basic technological requirements sufficient for construct-
ing a multimodal corpus. When we adopt the GeM layers of analysis, it is possible
to consider each one as a single layer of standoff annotation just as was illustrated
Multimodality and empiricism
for the simple page and sentence example of Figure 1. This has now been done with
Document Type Descriptions specified in XML-form for each layer. As usual with
formalisation, the demand for complete specification has resulted in a consider-
able number of refinements to the original model we have just sketched. These are
set out in full in the technical documentation for the corpus design (cf. Henschel
2002). Here we focus on just one layer of annotation, the layout structure, which
has been developed within the GeM project. For the purposes of this chapter we
will also concentrate on the addition of pages involving multimodal content rather
than go into the details of considering entire documents.
As we have seen, a precondition for standoff annotation is to establish a single
document containing the marked-up ‘basic units’ of any document being added
to the corpus. With GeM, these base level units range over textual, graphical and
layout elements and give a comprehensive account of the material on the page,
i.e. they comprise everything which can be seen on the page/pages of the docu-
ment. The base units we define for GeM include: orthographic sentences, sentence
fragments initiating a list, headings, photos, drawings, figures (without caption),
captions of photos, text in pictures, icons, table cells, list headers, page numbers,
footnotes (without footnote label), footnote labels, and so on. Each such element is
marked as a base unit and receives a unique base unit identifier. The base units pro-
vide the basic vocabulary of the page – the units out of which all meaning-carrying
configurations on the page must be constructed.
Details concerning the form and content of each base unit are not represented
at this level. All such information is expressed in terms of pointers to the rel-
evant units of the base level from the other layers of annotation. As suggested
above, this standoff approach to annotation readily supports the necessary range
of intersecting, overlapping hierarchical structures commonly found in even the
simplest documents. Single base units are commonly cross-classified to capture
their multifunctionality and can, for example, contribute to a visually realised lay-
out element as well as simultaneously functioning as a component of a rhetorical
argument. This ensures that we can maintain the logical independence of the layers
considered.
Thus, to take a relatively simple example, if we were annotating the part of a
page shown in the lower part in Figure 2, we would construct a base document
along the lines of the XML annotation shown at the top of the figure.4
Each typographically distinct element on the page is allocated to a different
base unit. The first unit (identified by the label ‘u-01’) corresponds to the headline
at the top of the page extract; here we can see that the only information captured
here is the raw text “£10m top of the range sale” – typographical information,
placement on the page, rhetorical function (if any), etc., are not represented. The
second unit does the same job for the large photograph – the ‘raw picture’ is repre-
John Bateman, Judy Delin and Renate Henschel
Figure 2. Page extract from a newspaper and corresponding base unit annotation
Multimodality and empiricism
would be drawn with it, and less likely that the body of the text or the headline
immediately move: although there would be limits to this in the context of the
page as a whole as the individual units making up this ‘story’ would not like to be
separated. General proximity is thus to be maintained, which is itself an argument
for maintaining all the units shown as a single higher-level layout unit.
Furthermore, within this, the block in the middle of the second column of text
stating that more information (of a particular kind: i.e. ‘the Chief ’s castle’) exists
and providing navigation information about where that information is located
(‘inside’ and ‘Page 2’) can also be moved relatively freely within its enclosing text
block, arguing for its treatment as a distinct layout unit at an intermediate level in
the overall hierarchy. An example of this kind of structuring is shown in Figure 3.
In general, the hierarchical structures proposed should be conservative – that
is, when there is no strong evidence in favour of a strict hierarchical relationship,
we prefer to posit a flat structure rather than insisting on some particular hier-
archicalisation. The layout hierarchy captures dependency relationships between
visually discovered elements on the page but no longer includes information about
the precise physical location of those elements on the page. It is therefore a signif-
icant abstraction away from the source document and generalises over a set of
‘congruent’ possible realisations.
A layout hierarchy is represented as a simple nested XML structure made up of
‘layout chunks’ and ‘layout leaves’. Layout chunks can have further layout chunks
embedded within them to set up the recursivity of the structures represented. Ter-
minal elements in the structure are represented as layout leaves. Each such unit
again receives its own unique identifying label and the entire structure is placed
within a single enclosing XML tag called the ‘layout root’. The contents of each
layout unit, that is, the elements on the page that comprise them, are identified
in the way standard for standoff annotation, i.e. the layout leaves contain cross-
references to the identifiers of the corresponding base units. The layout structure
corresponding to the example in Figure 3 is then as shown in Figure 4.
The interested reader can follow through the structure and the cross-
references as identified in the base units of Figure 2 to confirm that the hierarchical
view thus created does indeed correspond to the hierarchy given in Figure 3. This
should help make it clear why proper computational tools for checking the formal
consistency (e.g. are all the identifying labels used actually defined somewhere?)
are so important.
The representation of the orthographic and typographic information is then
relatively simple. A set of XML-specifications state which layout units have which
typographical features. In this way, it is straightforward to make generalisations
over subhierarchies drawn from the layout structure. For example, all the layout
units corresponding to a block of text that is realised uniformly in terms of its
Multimodality and empiricism
page
fragment
story byline
photo head-
headline + line article
caption body
photo-
photo
grapher
caption inside Continued
p2 text p2
body
typography may be grouped as a single node in the layout structure, and it is this
node which has the corresponding typographic features associated with it. This
allows information to be expressed concisely without repetition.
There are already very extensive vocabularies for describing typographical fea-
tures: we adopt these for this aspect of the GeM annotation scheme rather than
develop a further, ad hoc set of terms. Concretely, we use the typographical dis-
tinctions described as part of the XML formatting objects standard. An example
of such a specification for the unit corresponding to the headline at the top of the
page is given in Figure 5.
John Bateman, Judy Delin and Renate Henschel
<layout-root id=“lay-01”>
<layout-leaf id=“lay-02” xref=“u-01”/>
<layout-chunk id=“lay-03”>
<layout-leaf id=“lay-04” xref=“u-03 u-04 u-05”/>
<layout-leaf id=“lay-05” xref=“u-02” />
<layout-leaf id=“lay-06” xref=“u-06 u-07” />
</layout-chunk>
<layout-leaf id=“lay-07” xref=“u-08” />
<layout-leaf id=“lay-08” xref=“u-09” />
<layout-chunk id=“lay-09”>
<layout-leaf id=“lay-10” xref=“u-70 u-71 u-90”/>
<layout-leaf id=“lay-11” xref=“u-10 ... ” />
<layout-leaf id=“lay-12” xref=“u-99” />
</layout-chunk>
</layout-root>
The final component of the layout annotation layer adds in the information
about precise placement within a page. We separate a general statement of the po-
tential placement strategy employed on a page from that of the hierarchical layout
structure for that page. Placement is then indicated by adding to the layout el-
ements an ‘address’ given in terms of the general positions defined possible for
their page. We have found this separation of information to be worthwhile for
a number of reasons. First, it is quite possible that minor variations in the pre-
cise placement of layout elements can be undertaken for genre-specific reasons
without altering the hierarchical relationships present. Second, the separation of
placement information makes it possible to state generalisations over the physi-
cal placement that are inconveniently expressed at the level of individual layout
elements: for example, it is common that pages use various alignments for their
material – this alignment can hold over portions of the layout structure that are not
strongly related hierarchically. Good illustrations of the consequences of varying
such alignments or non-alignments are given, for example, in Schriver (1997: 314)
for complex instructional texts.
In order to fully capture these possible dimensions of variation, we express
within-page placement in terms of an area model. Area models divide the space on
a page into a set of hierarchically nested grids or tables. Since the grid technique is
Multimodality and empiricism
row 1
row 2
row 3
column 1
column 2
column 3
row 3
able for some particular content. This is commonly the case for advertisements
and other rhetorically distinct information such as the navigation elements in the
middle of column 2 of the sub-area of row 3 within top-level row 3.5
Although there are many interesting further issues that arise with this layer
of annotation, space precludes their discussion here. Readers are referred to the
GeM technical documentation for a more complete account. All of the pages of
the documents being added to the GeM corpus are described in the general terms
that have been set out here.
Providing annotation layers as described in this section for all of the GeM
layers is then the main task involved in constructing a multimodal corpus of this
sort. We use XML so that we can rely on standard tools and techniques for storing
the data, checking their integrity, and for presenting various views of the data when
considering analysis. This then places multimodal corpus design for the kinds of
documents that we are considering on a firm technological foundation. We also
use XML, however, to be able to make use of the tools that are now emerging in the
structured data representation industry for presenting queries and for searching
for regularities in the data captured. And it is to this that we now turn.
Space precludes anything here but a single brief example of using the GeM-
annotated corpus for linguistic research – drawing on our example in Section
2. Although the corpus needs to be considerably extended in coverage before we
can approach the kind of statements now possible in linguistic corpus analysis,
we nevertheless believe that the approach outlined represents a sound method-
ological direction for eventually achieving this goal. Our discussion in this section
must therefore be seen as merely suggestive of the possibilities that open up when
multimodal corpora are available in the form we propose.
We have made much of the fact that we now have a method and framework
for adding multimodal pages into a corpus of multimodal documents that is richly
annotated and XML-conformant. A prime motivation for this direction is to be
able to avail ourselves of another area of the emerging XML industry: that is the
area of searching and manipulating XML documents. In essence, the only reason
to put the effort into the highly structured forms of representation necessary for
a representation such as XML is the promise of being able to get out more than
one has put in. In the case of linguistic corpora, we are seeking the ability to ask
questions of our corpus in sufficiently flexible and powerful ways as to promote
theory construction and testing.
Multimodality and empiricism
The components of the XML standard that are relevant here are those con-
cerned with finding selected elements within a set of XML-structured data. One
large-scale effort in the World-Wide Web community that is concerned with this
task is the ‘XPath’ group. This group has formulated an approach to finding ele-
ments within an XML structure by specifying in a very general way ‘paths’ from
the root of the XML structure to the element that is being sought. The path is
similar to that used for files or folders on a computer system: as elements in XML
may be recursively structured, and each structural element is identified by its tag,
this provides a ready addressing mechanism to navigate around XML structures of
arbitrary size and complexity. As a simple example, if we wanted to locate within
a layout structure the top level layout chunks, then all we need write is an XPath
specification such as:
/layout-root/layout-chunk
and the result, when passed to a standard XPath-processor, would be the set of
layout-chunks immediately embedded within the layout-root. A variety of further
constructions make the XPath specifications into a powerful way of locating sets
of parts of XML documents that conform to given requirements – which is exactly
what is needed for corpus investigation.
For example, the following applied to the representation for a linguistic corpus
that we suggested in Section 3.2. above would return the contents of all elements
tagged as w-elements without any annotation – i.e, just the words.
//w
More indicative of the power of the XPath mechanism is the following, which
would give us all instances from the corpus where a word has been classified as
having the part of speech designated “WGv” (by means of the value of the ‘pos’
attribute):
//w[@pos=“WGv”]
a series of empirical and corpus-based studies for its investigation. If their frame-
work were to be established as correct, then a news story placed on the left of the
page is by virtue of that placement inherently ‘Given’ with respect to, or relative to, a
story that is placed on the right of the page. Several experimental setups can be en-
visaged for investigating this claim. We might ask readers to rate the various stories
and their pictures on a newspaper front page on a scale running from ‘expected’
to ‘exceptional’ and then see if there is any correlation with page placement. Alter-
natively, we might select articles that are on the ‘left’ of the page and those on the
‘right’ (allowing for area model and canvas perturbations) and have readers judge
these with respect to one another. Then we might ‘re-generate’ newspaper front
pages with the articles on the left and those on the right swapped to see if readers’
judgements are effected.
For all of these tasks, we can profitably employ an appropriately annotated
corpus of newspaper front pages. The selection of items on the left and those on
the right probably needs to be made with some sensitivity to the generic layout
of pages: it might be that we need to filter out the advertisements, or the ta-
ble of contents, that regularly happen in some newspaper to occupy the leftmost
(or rightmost) column. This can be pursued by following through the rhetorical
structure annotation of the page, finding the main nuclear elements, following the
cross-references back to the involved base-units, and selecting just those that are
positioned in the layout structure to the right or to the left of the corresponding
area models. This is exactly the kind of manipulation for which the XML compo-
nent XPath is being designed. We might also need to separate out experimental
runs involving pages with very different general layout schemes – for example,
those which are predominantly vertically organised and those which show a hor-
izontal organisation; again these kinds of properties can be calculated and made
into an explicit selection criterion on the basis of the area model.
Asking readers to judge the articles for degrees of Given/New can also be seen
as an annotation task: and this can be supported by existing annotation tools for
XML. To run our experiment, we might then define an additional ‘experimental’
layer of XML markup in which experimental subjects choose a rating for presented
parts of a page or of selected articles shown independently of their position on a
page. The selection of the articles is itself straightforward in that once we find
the set of base units that constitute an article, we simply present these as a run-
ning text, or text with pictures, ignoring the other information given in the layout
structure of the page. Our experimental layer of annotation then associates these
articles with Given/New ratings in senses hopefully including the very abstract
ones intended by Kress and van Leeuwen. We then run over the resulting anno-
tations, displaying the actual page placements of the articles with specific ratings.
If the given/new claim of Kress and van Leeuwen is correct, then we should see
Multimodality and empiricism
We have argued that it is essential that multimodal analysis that draws on linguis-
tic methods of analysis adopt a more explicit orientation to corpora of organised
data. Only in this way is there a hope of demonstrating that certain, currently
more impressionistic styles of analysis in fact hold germs of truth (or otherwise).
By presenting a first view of an analytic framework for organising multimodal
(page-based) data, we have tried to show how this can be done. The availability of
increasingly large-scale and inclusive bodies of such data should enable work on
multimodal analysis to shift its own genre – we expect that the kinds of discourse
adopted in analyses of this kind will be able to draw nearer to empirical linguistic
discourse and to go beyond styles of discourse more closely allied with literary or
cultural analysis.
While it may turn out that the kinds of meaning-making involved in mul-
timodal discourse are not amenable to analysis in this way, that the role of the
interpretative subject is too great and the constraints on meaning brought by the
products analysed too weak, we see it as at least methodologically desirable that we
pursue this path before dismissing it.
We believe the current layers of the GeM model to be the minimum neces-
sary for capturing the basic semiotic meaning-making potential of multimodal
pages. They are also, however, clearly not sufficient for all that one needs to ask –
for example, we have deliberately left out the detailed annotation of the contents
of pictorially realised elements of pages. This is one reason why the annotation
scheme has been defined in a manner which is deliberately open-ended in terms
of the information it covers. Further layers of annotation need to be considered.
One obvious candidate for such a layer is the detailed analytic scheme proposed
by Kress and van Leeuwen (1996). In addition, although we have said very little
about those levels of meaning-making which are more usually of concern to lin-
John Bateman, Judy Delin and Renate Henschel
guists: i.e. the linguistic structure, we believe that the form of annotation presented
here articulates well with the kind of linguistic analysis that is capable of represent-
ing the rich connections between language forms and their underlying functions,
and that the model as a whole then forms the most sophisticated attempt to model
explicitly all the layers that constitute genre available to date.
Clearly, after setting out the motivation and methods for this approach to mul-
timodal corpora construction, the main body of work remains to be done. Only
when we have such corpora can we start putting the programmes of exploration
sketched in the previous section into action. That is a considerable and long-term
task; where it will take us in our understanding of the meaning-making potential
of multimodal documents is something that only the future will tell.
Notes
* The GeM project was funded by the British Economic and Social Research Council, whose
support we gratefully acknowledge.
. In other genres, the area conceived of as available for layout may not be a page at all: it may
be a spread, a run of pages, or a screenful.
. Durusau and O’Donnell’s example is actually rather more complicated. They also give an
excellent overview of possible approaches and problems.
. ‘Genre and Multimodality: a computer model of genre in document layout’. Funded by the
British ESRC, grant no. R000238063. Project website: http://www.purl.org/net/gem
. This page extract is selected from the front page of an edition of the Scottish daily newspaper,
The Herald. It is reproduced by permission.
. Note that to describe what is going on in the case of the newspaper page fully, we have an
interesting interaction between several other layers of the GeM model. The fact that a newspaper
page is organised throughout in terms of columns is nowadays one of the canvas constraints that
hold for the genre: no matter how the individual articles are organised in terms of their own area
models, they must be ‘poured’ into the mould provided by the canvas, which, for newspapers,
consists of columns. In earlier times, when print technology was more restrictive, we can even
imagine the ‘column nature’ of newspapers being a production constraint – i.e. one imposed by
the technology of production and so not variable for different purposes. The GeM constraints
form a natural hierarchy; for example, canvas constraints can only be varied within the range of
possibilities that the production constraints provide for.
References
Bryan, Martin (1988). SGML: An Author’s Guide to the Standard Generalized Markup Language.
New York: Addison-Wesley Publishing Company.
CES (Corpus Encoding Standard). (2000). “Corpus encoding standard. Version 1.5.” http://
www.cs.vassar.edu/CES
Delin, Judy, Bateman, John, & Allen, Patrick (2002/3). “A model of genre in document layout.”
Information Design Journal, 11 (1), 54–66.
Delin, Judy & Bateman, John (2002). “Describing and critiquing multimodal documents.”
Document Design, 3 (2), 140–155.
Durusau, Patrick & O’Donnell, Matthew B. (submitted). “Implementing concurrent markup in
XML.” Markup Languages: Theory and Practice.
Fries, Peter H. (1995). “Themes, methods of development, and texts.” In R. Hasan & P.
Fries (Eds.), On Subject and Theme: A Discourse Functional Perspective (pp. 317–360).
Amsterdam: Benjamins.
Halliday, M. A. K. (1978). Language as Social Semiotic. London: Edward Arnold.
Henschel, Renate (2002). GeM Annotation Manual, Bremen and Stirling. University of Bremen
and University of Stirling. http://www.purl.org/net/gem
Kress, Gunther & van Leeuwen, Theo (1996). Reading Images: the Grammar of Visual Design.
London and New York: Routledge.
Kress, Gunther & van Leeuwen, Theo (1998). “Front pages: The (critical) analysis of newspaper
layout.” In A. Bell & P. Garrett (Eds.), Approaches to Media Discourse (pp. 186–219). Oxford:
Blackwell.
Kress, Gunther & van Leeuwen, Theo (2001). Multimodal Discourse: The Modes and Media of
Contemporary Communication. London: Arnold.
Lie, Hakon K. (1991). The Electronic Broadsheet: All the News That Fits the Display.
M.A. thesis. Boston: School of Architecture and Planning, MIT. http://www.bilkent.edu.
tr/pub/WWW/People/howcome/TEB/www/hwl_th_1.html
Martin, James R. (1992). English Text: System and Structure. Amsterdam: Benjamins.
Martin, James R. (2002). “Fair trade: negotiating meaning in multimodal texts.” In P. Coppock
(Ed.), The Semiotics of Writing: Transdisciplinary Perspectives on the Technology of Writing
(pp. 311–338). Turnhout: Brepols.
McEnery, Tony & Wilson, Andrew (2001). Corpus Linguistics. Edinburgh: Edinburgh University
Press.
Royce, Terry D. (1998). “Synergy on the page: exploring intersemiotic complementarity in
page-based multimodal text.” Japan Association for Systemic Functional Linguistics (JASFL)
Occasional Papers, 1, 25–49.
Schriver, Karen A. (1997). Dynamics in Document Design: Creating Texts for Readers. New York:
John Wiley and Sons.
Swales, John M. (1990). Genre Analysis: English in Academic and Research Settings. Cambridge:
Cambridge University Press.
Thompson, Henry S. & McKelvie, David (1997). “Hyperlink semantics for standoff markup of
read-only documents.” In Proceedings of SGML Europe’97.
van Leeuwen, Theo & Kress, Gunther (1995). “Critical layout analysis.” Internationale Schul-
buchforschung, 17, 25–43.
Waller, Robert (1987). The Typographical Contribution to Language: towards a Model of
Typographic Genres and Their Underlying Structures. PhD dissertation. Department of
Typography and Graphic Communication, University of Reading, Reading, UK.
P II
Kay L. O’Halloran
National University of Singapore
. Introduction
The miracle of the appropriateness of the language of mathematics for the for-
mulation of the laws of physics is a wonderful gift which we neither understand
nor deserve. We should be grateful for it and hope that it will remain valid in fu-
ture research and that it will extend, for better or for worse, to our pleasure even
though perhaps also to our bafflement, to wide branches of learning.
(Wigner 1960: 306)
typically expressed through other means, for example probability statements. Log-
ical meaning is realised through linguistic and symbolic conjunctive adjuncts and
structural conjunctions aided by textual layout. In terms of textual meaning, the
statement is organised so that the y appears on the left hand side of the equation.
Symbolic mathematics is organised in very specific ways to make immediate the
experiential, logical and interpersonal meaning of the mathematical statements.
There are multiple metafunctionally based systems which constitute the dif-
ferent grammars of the symbolism, visual display and visual images. Critically,
choices from these systems function not only intra-semiotically within their own
grammars, but they can also be re-represented inter-semiotically across the three
different grammars with, in some cases, an unprecedented degree of equivalence of
meaning. For example, the linguistically realised item “distance” functions intra-
semiotically within the grammar of the English language. However, “distance” can
also be inter-semiotically re-represented in the form of a visual line segment and a
symbolic x. Thus (i) “distance”, (ii) “______ ” and (iii) “x” each function as choices
in three different grammars. However, once these inter-semiotic relations are es-
tablished across language, visual images and the symbolism, the different choices
become somewhat interchangeable. For example, the symbolic may appear in the
linguistic and visual parts of the text. The most versatile semiotic resource ap-
pears to be the symbolism where functional elements sit comfortably not only
within symbolic statements, but also within linguistic and visual forms of semio-
sis. The sophisticated inter-semiotic relations in mathematics, however, do not
always result in equivalence or congruency resulting in an interchangeability of
functional elements. As we shall see below, inter-semiosis opens up the way for
semiotic metaphor where there is semantic change. Understanding this process
may be one important key to understanding the success of mathematics.
The use of multiple semiotic resources means that during the process of inter-
semiosis semiotic metaphors involving a semantic shift may occur as elements are
translated into another semiotic form (O’Halloran 1999a, 1999b, 2003a, 2004). For
example, a linguistically realised process such as “measuring” may become a visual
entity in the form of a line segment in a mathematics diagram. This is a semiotic
metaphor where a (linguistic) process becomes a (visual) entity in the shift from
language to visual means of representation. This means that the meaning poten-
tial of the visual realm can now be exploited in what is essentially a re-casting of
the semantic realm. The mental/behavioural/material process of measuring be-
comes a visual entity which may now be re-conceptualised in relation to other
visual entities in the mathematical diagram. The significance of such a semantic
re-organisation becomes clear in Section 4, which is concerned with the nature of
changes in visual representations in mathematics which occurred with Descartes.
Kay L. O’Halloran
Figure 2. Calculating depth and width of a Castle Well or Tower Illustration (Babington
1635). Reproduced by permission of the Syndics of Cambridge University Library
Kay L. O’Halloran
Figure 3. Calculating the Height of the Cliff Face (Tartaglia 1546). Reproduced by courtesy
of the Director and Librarian, the John Rylands University Library of Manchester
Figure 4. The Path of the Cannon Ball (Tartaglia 1546). Reproduced by courtesy of the
Director and Librarian, the John Rylands University Library of Manchester
Figure 5. The Human Eye Remains (Descartes 1998: 92). Reproduced by courtesy of Cam-
bridge University Press
Kay L. O’Halloran
nent while participants such as the men, and circumstances like cliffs, rivers, castles
and cannons increasingly disappeared. In some cases, there only remained one
part of the human body, the eye as displayed in Figure 5.
There was a shift in the nature of the depicted activity in this new type of
semiotic reconstrual. Men were no longer engaged in some physical, mental or
material activity such as measuring, but rather the human eye minus the body
became engaged in acts of perception.
Figure 6. From Context to Circles and Lines. (A) Descartes (1998: 79), (B) Descartes
(1998: 81). Reproduced by courtesy of Cambridge University Press
On the effectiveness of mathematics
Figure 7A. Descartes’ Use of Algebra to Construct Curves. Descartes (1954: 53). Repro-
duced with acknowledgement and thanks to Dover Publications
In Figure 6, the removal of the physical context and the human body is clearly
demonstrated in the beginnings of modern mathematical formulations.
In Figure 6A, Descartes’ drawing of the path of a ball travelling through a
finely woven cloth (indicated by the line segments AB to BD) and water (AB to
BI) includes the man actually hitting the ball. In Figure 6B, however, the physical
context is removed as the path of the ball is depicted as a series of lines and a circle.
Kay L. O’Halloran
Figure 7B. Descartes’ Use of Algebra to Construct Curves. Descartes (1954: 109). Repro-
duced with acknowledgement and thanks to Dover Publications
participants to be the circle and line segments, the concern lies in the relations
between those entities and their parts rather than the material context of the prob-
lem. Replacing the semiotic construal of the material with the metaphorical in the
form of mathematical diagrams involving lines, triangles and arcs permitted the
solution of new types of problems. The height of the cliff and the castle wall were
at least theoretically physically measurable. But how so with the path of a cannon
ball? This became possible only through a concern with spatiality and relations
which Descartes eventually linked to algebra description. In time the visual im-
ages assumed a secondary status in relation to the algebra (see e.g. Davis 1974).
Descartes’ construction of different curves and an increasing reliance on algebra
to describe those curves may be seen in Figures 7A and 7B.
Descartes had shifted from the construction of curves using the Greek material
compass and ruler to using an abstract compass which was semiotically grounded.
Descartes claimed that his proportional compass had the same certainty as the
ordinary compass. As Shea (1991: 45) explains: “This new instrument does not
have to be physically applied; it is enough to be able to visualise it and use it as a
computing device. In other words, pen and paper is all that is required, since the
nature of the curve is revealed in its tracing.”
The beginnings of modern mathematics are seen in Descartes where the curves
are described algebraically although, as Davis and Hersh (1986) claim, this really
amounted to algebraisation of ruler-and-compass constructions. “In its current
form, Cartesian geometry2 is due as much to Descartes’ own contemporaries and
successors as to himself ” (Davis & Hersh 1986: 5). Although later mathematicians
considered the geometrical construction to be sets of points satisfying certain
criteria defined by the algebraic equation, despite an increasing dependence on
algebra, “Descartes never defined as geometrical those curves that admit of al-
gebraic equations” . . . [nonetheless] . . . He simplified algebraic notation and set
geometry on a new course by his discovery that algebraic equations were useful
not only in classifying geometrical curves, but in actually devising the simplest
possible construction” (Shea 1991: 67).
Mathematicians such as Newton later used the algebraic equations as com-
plete descriptions of curves rather than a tool for construction. Mathematical
symbolism became the semiotic through which curves were defined and prob-
lems solved with the aid of mathematical graphs and diagrams. However, we may
ask what prompted Descartes to algebraicise his geometry, thus paving the way
for “his new world of relations that seventeenth century mathematicians entered
with pride” (Shea 1991: 67). We must turn to Descartes’ philosophy in order to un-
derstand how these mathematical construals, which had became decontextualised
and algebraicised, represented a method upon which he thought was the path to
knowledge. He used algebra as a tool for the construction of curves, but perhaps
Kay L. O’Halloran
from this endeavour arose his method for construing and deriving what he viewed
as true. And that method depended on algebraic descriptions which, while offering
more, admitted less.
Descartes’ project was to solve the problem of knowing what is true at the cost
of denying what appears to be obvious. Following Plato (429–347BC), Descartes
turned to reason rather than the senses as the means for achieving that aim. For
Descartes, sensory perception of the material world was unreliable in a way simi-
lar to that envisaged by Plato. While Plato used the narrative of shadows in a cave
to illustrate sensory illusion, Descartes attempted to demonstrate the unreliabil-
ity of the senses through a discussion of a ball of wax in the Second Meditation
(1952: 202–212). Descartes explains that the properties of wax perceived by the
senses, for example, flavour, smell, colour, shape and size, are unreliable because
they change as the wax is heated. Mental perception rather than sense perception
allows examination of the reliable essence of matter which he conceptualises as
motion and extension.3
What, then, was it I comprehended so distinctly in knowing the piece of wax? Cer-
tainly, it could be nothing of all that I was aware of by way of the senses, since all
the things that came by way of taste, smell, sight, touch and hearing, are changed,
and the wax none the less remains. . . As we find, what then alone remains is a
something extended, flexible and moveable. . . which cannot be [adequately] ap-
prehended save by the mind. (Descartes 1952: 208–209)
Descartes’ method is described in the twenty one Rules for the Direction of the
Mind, which was later reduced to a method involving evidence, division, order
and exhaustion in Discourse on Method, (Part II). Descartes’ method was to divide
to get the simplest essence, and then order and enumerate to understand compos-
ites. As Shea (1991: 131) explains: “(a) nothing is to be assented to unless evidently
known to be true; (b) every subject-matter is to be divided into the smallest pos-
sible parts, and each dealt with separately; (c) each part is to be considered in
the right order, the simplest first; and (d) no part is to be omitted in reviewing
the whole.”
Descartes’ method reveals his increasing dependence on his new semiotic tools
in the form of algebraic descriptions. In other words, it appears that his four
cannons or rules were built on what he could achieve semiotically through his
algebraicisation of geometry. He could express relations in the simplest elements,
and then rework to understand the more complex through the use of the sym-
bolism. His dissatisfaction with the inadequacies of language is openly expressed
On the effectiveness of mathematics
Descartes wanted an aid for thought for describing the essence of bodies, that
which is knowable in terms of motion and extension, and this aid should be sim-
ple and abstracted from anything superfluous. For “perfect understanding”, the
question should be rendered “as simple as possible, and resorting to enumeration,
divide[d] . . . into its minimal parts” (Descartes 1952: 76). In addition to using the
geometrical curves as an aid to thought,4 Descartes explicitly says in Rule XVI
that mathematical representations should be algebraically expressed due to the
simplicity of this type of formulation.
Thus if I write 2a3 , that will be as if I should write the double of the magnitude
signified by the letter a, which contains three relations. By this device not only
do we obtain a great economy in words, but also, what is more important, we
present the terms of the difficulty so plain and unencumbered that, while omitting
nothing which is needed, there is also nothing superfluous, nothing which engages
our mental powers to no purpose. . . (Descartes 1952: 101)
For Descartes, the advantages in using algebraic descriptions for curves were nu-
merous. Algebraic formulations were representations of relations consisting of the
simplest elements. They had a direct relation to the curves in terms of the pro-
portionality which was displayed graphically, and the formulations could be used
as tools for reasoning. The result of these efforts was that mathematics became
concerned with a limited semantic field in the form of proportional relations
which could be displayed visually. The algebraic expression of those relations were
concerned with brevity and the removal of extraneous information for effective
reasoning. This resulted in the development of a symbolic grammar which con-
densed meaning most simply and efficiently in the form of the essential elements.
As I discuss in the final section of this chapter, the resulting grammar for mathe-
matical symbolism developed a new type of grammatical complexity which is not
found in language.
Descartes defines motion, size, shape and arrangement of parts as the concern
of mathematics in an attempt to remove doubt through the use of the intellect.
The instrument through which certain conclusions should be reached involved
semiotic construals of that phenomenon in the form of algebraic descriptions and
curves. This led to a certain homogenisation where material objects became essen-
tially identical and replaceable with one another. In addition, the material context
was removed from that semiotic space. The move from perception and natural
appearance is apparent in the semiotic nature of mathematical texts. The human
body and eye were also removed. The material world developed a secondary status
as it was replaced with what was mathematically describable in terms of spatiality
and relations through symbolism and visual images. Descartes’ project entailed a
On the effectiveness of mathematics
re-writing of nature, as explained by Barry (1996: 55): “From the Cartesian point
of view, the only proper way to decode nature is to write it . . . in the most exact
terms possible.” But from the preceding discussion, we can see that this description
centred around particular dimensions of meaning.
The modern mathematical and scientific project did not end with Descartes.
Instead Isaac Newton developed a science which included rather than excluded
matter. Rather than depending on Cartesian notions of extension and motion,
Newton introduced physical matter in order to formulate entities such as attrac-
tion, force, absolute space and time, and gravitation. As Barry (1996: 55) explains,
Newton altered the Cartesian project of the primacy of ideas and the dichotomy of
mind and matter in relation to that which was mathematically intelligible. New-
ton’s semiotic is mathematical symbolism (see Figure 8) and he conceptualises the
invisible such as forces of attraction and gravity as mathematical laws.
In this way, Newtonian attraction is “written in a syntax inspired directly
by the mathematical forms which Descartes (and Galileo) stressed time and
time again as the most proper form of articulation for the new physics” (Barry
1996: 127). We may not be able to perceive attraction, but semiotically it can be
construed. “The decisive importance of the Newtonian mathematical vision re-
garding the issues of natural appearance and perception is dramatically revealed in
the concept of attraction: what we see with our eyes demands the existence of what
may never be seen, but must be mathematically granted” (Barry 1996: 134). The
division between the perceptually visible and invisible disappears with Newton.
Newton’s theoretical stance is accompanied by real and imaginary experiments
to support or refute his theories. In this shift from the real to the experimental,
once again what is semiotically constructed is accepted as scientific validation.
Newton believes that many of the experiments which he devised but never fully
implemented (e.g. his notebook drawings) nonetheless serve as empirical valida-
tion or invalidation of certain hypothetical stances. It is as if the diagram of a
possible experiment is virtually the same as the experiment itself.
(Barry 1996: 160)
Descartes sought to move beyond the senses and realm of appearance to the in-
tellect to describe mathematically that which could not be doubted in the form
of shape and motion. Newton enlarged this project by admitting matter and
that which could not be perceived to be construed mathematically. This project
was aided by the use of technical and laboratory apparatus where the phenom-
ena under examination were semiotic abstractions. While recent developments in
mathematics in the past two decades have fundamentally changed the Newtonian
mechanical view of the universe (for example, dynamical systems theory is con-
cerned with the nonlinearity rather than linearity of physical systems), mathemat-
ical symbolism, visual display and language remain the major resources through
Kay L. O’Halloran
which physical systems are described, although now this is largely accomplished
computationally in a dynamic virtual and often visual world through the use of
computers. Nature was re-written semiotically, and in Section 7, I briefly exam-
ine the nature of the semantic realm which was admitted in this re-construal. For
Newton and others “Mathematical precision is the only form which the perfect re-
lation between mind and matter could possibly take” (Barry 1996: 127). What was
allowed within that field of mathematical precision?
On the effectiveness of mathematics
a. Language
Logical Meaning: The procedures are explained as a series of steps and the impli-
cations of the findings are formulated and contextualised through language.
Textual Meaning: The organisation of the linguistic text (and the whole mathe-
matics text) is generically defined.
Logical Meaning: The visual image allows the logical relations of the parts and
patterns to be perceived and conceptualised but not described exactly.5
Figure 9. The Diagram for the Volume of a Sphere. Reproduced from Stewart
(1999: 380). From Calculus: Combined Single and Multivariable 4th edition by Stewart.
© 1999. Reprinted with permission of Brooks/Cole, a division of Thomson Learning:
www.thomsonrights.com
On the effectiveness of mathematics
Textual Meaning: The visual images organise the proportional relations between
the entities as a whole. The relative positioning indicates stability in the case of
axis, and dynamicism in case of the curves. The labelling of the mathematics
participants allows cohesive links to the main body of the text and the symbolism.
c. Mathematical symbolism
4
V = πr3
3
This symbolic statement may be considered as a series of rankshifted partici-
pants ( 43 , π and r) and processes (÷ and ×) as indicated by the square brackets
[[ ]] below.
V = (4 ÷ 3) × π × r × r × r
V= [[(4 ÷ 3)]] × π × [[r × r]] × r
Logical Meaning: The symbolism is the semiotic tool through which logical rea-
soning largely takes place. In the solution to mathematics problems, long se-
quences of symbolic complex nested structures of reasoning take place.
Textual Meaning: The textual organisation of the symbolism has generic conven-
tions which allow the solution to the mathematics problems to be followed with
comparative ease. There are discursive links to the main body of the text through
linguistic selections.
. Inter-semiosis in mathematics
Since Descartes, our understanding of the ordered physical world has changed
in that the complexity, chaos and indeterminacy underlying physical systems is
now generally accepted. Despite this, the step by step approach of modern science
has been reasonably successful in describing the behaviour of physical systems
because, as Davies (1990) explains, many systems are approximately linear in
nature and so breaking down the systems into smaller parts to understand the
nature of the whole as advised by Descartes appears to work: “By analysis, one can
chop up complicated systems into simpler components. And understanding of the
behaviour of the components then implies, ipso facto, an understanding of the
whole” (Davies 1990). However, this attempt to define mathematically the under-
lying regularity of ordered systems breaks down at some point when the behaviour
of the system becomes unpredictable:
On the other hand, they [all physical systems] turn out to be nonlinear at some
level. When nonlinearity becomes important, it is no longer possible to proceed
by analysis, because the whole is now greater than the sum of the parts. Non-
linear systems can display a rich and complex repertoire of behaviour and do
unexpected things.
Kay L. O’Halloran
Notes
* I wish to thank Emeritus Professor Philip Davis (Applied Mathematics Division, Brown Uni-
versity) for his valuable comments on an earlier draft of this chapter.
For further development of mathematics as a multisemiotic discourse see: O’Halloran, K.
(2005). Mathematical Discourse: Language, Symbolism and Visual Images. London & New York:
Continuum.
The illustrations in this chapter have been reproduced with the kind permission of the following
libraries and publishers:
Figure 1: Reproduced by courtesy of the Director and Librarian, the John Rylands University
Library of Manchester.
Figure 2: Reproduced by permission of the Syndics of Cambridge University Library.
Figure 3: Reproduced by courtesy of the Director and Librarian, the John Rylands University
Library of Manchester.
Figure 4: Reproduced by courtesy of the Director and Librarian, the John Rylands University
Library of Manchester.
Figure 5: Reproduced by permission of Cambridge University Press.
Figure 6A: Reproduced by permission of Cambridge University Press.
Figure 6B: Reproduced by permission of Cambridge University Press.
Figure 7A: Public domain material with acknowledgement and thanks to Dover Publications.
Figure 7B: Public domain material with acknowledgement and thanks to Dover Publications.
Figure 8: Reproduced by permission of Cambridge University Press.
Figure 9: Reproduced with permission from Thomson Learning Global Rights Group.
References
Anderson, Myrdene, Sáenz-Ludlow, Adalira, Zellweger, Shea, & Cifarelli, Victor (2003).
Educational Perspectives on Mathematics as Semiosis: From Thinking to Interpreting to
Knowing. Ottawa: Legas Publishing.
Babington, John (1635). A Short Treatise of Geometrie. London: Thomas Harper. Reprinted 1971.
Amsterdam and New York: Da Capo Press.
Barry, James (1996). Measures of Science: Theological and Technical Impulses in Early Modern
Thought. Illinois: Northwestern University Press.
Davies, Paul C. W. (1990). “Why is the world knowable.” In R. E. Mickens (Ed.), Mathematics
and the Language of Nature (pp. 14–54). Singapore: World Scientific.
Davis, Philip J. (1974). “Visual geometry, computer graphics and theorems of perceived type.”
Proceedings of Symposia in Applied Mathematics, 20, 113–127.
Davis, Philip J. (1998). “Mickey flies the Stealth.” SIAM News, 31 (3).
http://www.siam.org/siamnews/04-98/mickey.htm
Davis, Philip J. (2000). The Education of a Mathematician. Natwick, MA: A. K. Peters Ltd.
Davis, Philip J. & Hersh, Reuben (1986). Descartes’ Dream: The World According to Mathematics.
New York: Harcourt Brace Jovanovich.
Descartes, Rene (1952). Descartes’ Philosophical Writings. N. Kemp Smith (Ed. and Transl.).
London: Macmillan & Co.
Descartes, Rene (1954). The Geometry of Rene Descartes. D. E. Smith & M. L. Latham (Transl.).
1st ed. 1637. New York: Dover.
Descartes, Rene (1998). The World and Other Writings. S. Gaukroger (Ed. and Transl.).
Cambridge: Cambridge University Press.
Eagle, Ruth M. (1995). Exploring Mathematics through History. Cambridge: Cambridge Uni-
versity Press.
Halliday, M. A. K. (1994). An Introduction to Functional Grammar. London: Arnold.
Halliday, M. A. K. & Martin, James R. (1993). Writing Science: Literacy and Discursive Power.
London: Falmer.
Hamming, Richard W. (1980). “The unreasonable effectiveness of mathematics.” American
Mathematical Monthly, 87 (2), 81–90.
Hersh, Reuben (1997). What is Mathematics, Really? London: Jonathon Cape.
Høyrup, Jens (1994). In Measure, Number, and Weight: Studies in Mathematics and Culture. New
York: State University of New York Press.
Judovitz, Dalia (2001). The Culture of the Body: Genealogies of Modernity. Ann Arbor: The
University of Michigan Press.
Lemke, Jay L. (1998). “Multiplying meaning: visual and verbal semiotics in scientific text.”
In J. R. Martin & R. Veel (Eds.), Reading Science: Critical and Functional Perspectives on
Discourses of Science (pp. 87–113). London: Routledge.
Mickens, Ronald E. (1990). Mathematics and Science. Singapore: World Scientific.
Newton, Isaac (1981). “Newton’s simplified proof of fatio’s reduction of the condition of the
fall in least time along an arc of the brachistochrone to a curvature property of the cycloid
[1700].” In D. T. Whiteside (Ed.), The Mathematical Papers of Isaac Newton, Vol. VIII: 1697–
1722 (pp. 86–91). Cambridge, UK: Cambridge University Press.
O’Halloran, Kay L. (1999a). “Interdependence, interaction and metaphor in multisemiotic
texts.” Social Semiotics, 9 (3), 317–354.
On the effectiveness of mathematics
Multimodality in language
teaching CD-ROMs*
Martin Kaltenbacher
University of Salzburg, Austria
There is currently much discussion about the impact of new media and
information technologies on learning and teaching. CD-ROM language ‘trainers’
are a relatively new type of resource which can potentially provide users with
various new multimodal opportunities for language learning. This chapter
demonstrates current strategies in such CDs for combining visualisations (sound
waves, images and video-clips), sound (speech) and written text. While the
purpose of these strategies is ostensibly to train the users’ language skills and to
help them understand the meaning choices in the target language, it will be
shown that lack of relevant linguistic, semiotic or pedagogic expertise in the
design process is too often delivering ‘multimodality’ which actually
disrupts learning.
. Introduction
. Premises
materials since the late 1970s (see Galloway 1993; Johnson 2001: 182ff.). An im-
mediate suspicion arises as to whether computer software can be a suitable tool
for providing such creative and interactional language.
Up to now there is (to my knowledge) no software available on the market
that can react to a learner’s mis- or non-understanding by linguistically negotiat-
ing the intended meaning, and it seems unlikely that such software will ever be
available. This is the point where non-verbal semiotic modes might help to clarify
the meaning of an otherwise opaque linguistic sign. When a learner in a tradi-
tional classroom setting does not understand the word dog, for instance, s/he can
ask the human teacher What is a dog?, and the teacher will usually come up with
the proper explanation. A computer will remain silent on this question, no mat-
ter how vigorously you shout it at the screen. What a computer program can do
instead is supply other semiotic encodings of the lexical item, either straight away
or upon request. Possible modes for negotiating the requested meaning in such a
case range from the very simple to the very sophisticated. Additional information
could be presented in form of an extra window giving a simple translation of dog,
(e.g. Hund, chien, cane, etc.). It could be visualised in the form of an icon or a
photograph of a dog, which could again be accompanied by sound (e.g. barking).
It could even be presented in video format. The range of possibilities to promote
understanding is wide.
Such additional encodings must meet two basic requirements, a semiotic one
and an economic one. The semiotic one requires that the additional encoding iden-
tifies the intended meaning (the signified, in the Saussurean sense) as clearly and
exclusively as possible, e.g. by showing the picture of a dog and not just any an-
imal. This might appear to go without saying, yet it is not necessarily obvious to
software designers, as some examples below will show. The economic requirement
is that the intended meaning should be conveyed to the user at the lowest possi-
ble technological cost. Naturally, a language teaching CD cannot provide a video
sequence with every lexical item. Very often it cannot even offer an image with
every item. Apart from technical restrictions to a CD’s storage capacity this would
not be communicatively functional either. To answer each comprehension check
of a learner with a flood of new informational input would infringe Grice’s (1989)
cooperative principle.
Language teaching CD-ROMs come in three different categories. General En-
glish courses make up the biggest group of products available on the market. The
second category contains CDs that concentrate on a particular skill or a particular
aspect of language learning. Typical product names here are ‘grammar trainer’, ‘vo-
cabulary trainer’, ‘communication trainer’, ‘writing assistant’. The third category
offers products that teach an LSP variety, most commonly Business English.
Martin Kaltenbacher
Many of the language teaching CDs available on the market contain some type
of pronunciation exercise, where the learner can listen to single words or com-
plete sentences produced by a model native speaker, record her/his own version
and then compare the two versions. The producers of Kommunikationstrainer En-
glisch 3.0 by Digital Publishing (1999) offer special training in the English vowels
and consonants in what they call their “pronunciation lab”. However, they seem
to confuse graphemes with phonemes, when they train users to ‘pronounce’ let-
ters of the English alphabet: “Besondere Beachtung bei der Aussprache finden die
englischen Buchstaben g, h, j, v, w, y.” (pronunciation of the English letters g, h,
j, v, w, y needs special emphasis) (Kommunikationstrainer Englisch 3.0: Pronun-
ciation, Vowels, Exercise 1). Incidentally, they also seem to confuse vowels with
consonants.
It has been shown that the drilling of pronunciation patterns is pointless if the
learner does not also receive explicit pronunciation instruction (e.g. Verhofstadt
2002: 121). Moreover, the drilling of isolated words or syllables, let alone letters,
neglects the importance of prosody and intonation for natural speech.
A visual encoding from the field of acoustic phonetics that has become a
widely used tool in language teaching CD-ROMs for checking the learners’ accu-
racy in pronunciation is EVF – electronic visual feedback (Verhofstadt 2002: 128).
EVF usually shows an oscillogram (a sound wave) of the drilled phrase as uttered
by a model speaker and by the learner. Some programs add to this a numeric scale
giving the percentage of correspondence between the two utterances. An exam-
ple of this is given in Figure 1, taken from Vokabeltrainer English (Sprachlabor,
Tricky words, Aussprache). The sound wave to the left represents the utterance
of the word except produced by the American English native speaker serving as
the software’s model speaker, the sound wave to the right is my production. The
correspondence between the two versions is 40%.
When we consider the semiotic functions of this representation, it becomes
clear that the choice of a sound wave as a tool for enhancing learning is a bad one.
First of all, the average language learner has no training in acoustic phonetics nor
in interpreting sound waves, and products making use of such representations do
generally not provide such training either. Second, even if information on how to
read a sound wave was provided, it would not be useful, as there is general agree-
ment among phoneticians that a sound wave does not contain much phonetic,
let alone any articulatory, information (e.g. Ladefoged 1996: 43). Third, displaying
‘percentages of correspondence’ between the two utterances is discouraging for the
learner, as no information is provided as to where in one’s particular articulation
the lack of correspondence lies. The display offers no clue about whether the non-
correspondence is due to inaccuracy in voicing, vowel length, stress, aspiration,
place of articulation, etc. The only way for a learner to improve her/his pronun-
ciation, or more precisely, to achieve a higher degree of correspondence with the
model articulation, is through trial and error. In fact, even native speakers of En-
glish tested in this ‘pronunciation lab’ could not achieve results that were assessed
as satisfactory by the software.
Summarising, we have to conclude that such EVFs are not suitable tools for
teaching pronunciation to non-expert learners. The best they can do is to serve as
eye-catchers: they are colourful, sophisticated and impressive. The only informa-
tion a learner can get from them is that her/his utterance is not like that of the
model speaker and therefore unacceptable, even if the latter is not the case at all.
Translated into the systemic functional terms laid out by Halliday (1994: 296ff.),
each visualisation contributes to the information units in the different sections of
a CD. Sections are organised through the arrangement of already known (Given)
elements and some elements which are New and supposed to be learned. The first
type of image represents the Given (see e.g. restaurant in Figure 2 below). It re-
inforces the context in which the new words and phrases (waitress, white wine,
dinner, etc.) are presented. The second type of image depicts these New element
(see e.g. dinner in Figure 5).
An important aspect is that in multimodal, just as in monomodal verbal text,
“discourse has to start somewhere, so there can [must] be discourse-initiating
units consisting of a New element only” (Halliday 1994: 296). This is achieved in
language teaching CDs by arranging the material into different units entitled, e.g.
in the restaurant, at the hotel reception, at the railway station. Entry ports to such
units are usually highlighted textually and visually, and sometimes by means of
short video clips and sound files.
So far so good – but a picture paints a thousand words. Depicting the isolated
meaning of such a word as dinner poses some difficulties even within a context of
in a restaurant or eating. One of the problems will be making explicit the distinc-
tion between denotative and connotative meanings of dinner in cultural context.
What dinner means for me might be unacceptable (if not revolting) for somebody
else. A further problem is that a complex picture, like Figure 5, may denote many
separate meanings (‘paint a thousand words’), which taken together may or may
not raise in the learner a connotation that resembles the denotation of the lexical
meaning of dinner, which the picture wants to convey. This problem of finding not
one but many denotative meanings within a picture could easily be reduced by us-
ing icons instead of photographs or complex images, as icons are usually designed
to denote one thing only.
All images in a CD fulfil a dual function. A photograph, drawing, cartoon
or icon always serves as an eye-catcher to make textual information more attrac-
tive. The more important function, however, is to identify to the user the mean-
ing of the verbal in a non-verbal semiotic code. As Wahlster (1996: 9.3) puts it,
“an important synergetic use of multimodality in systems generating text-picture
combinations is the disambiguation of referring expressions.” Successful disam-
biguation depends on establishing appropriate cohesion between the verbal text
and the picture. Wherever such cohesion is lacking, the text-picture combination
is not useful for language teaching purposes, as the different modes convey non-
identical meanings. Unfortunately, CD-ROMs frequently contain pictures that do
not disambiguate the meanings of the verbal text. In many cases the pictures are ei-
ther too general or they are ambiguous, so that they allow more than one semiotic
Multimodality in language teaching CD-ROMs
(Given) for all the words and sentences that one can access within the section. A
surprise here is that the accompanying sound does not match the meaning of the
visual. The photograph shows an empty dining room with tables nicely set with
posh china and silver, ready to accept a flood of guests. The sound, however, is
that of a full restaurant with people talking and laughing, cutlery clattering and
glasses touching one another, as toasts are proposed.
A mouse-click on the menu lying on the front table opens a sub-section of
New elements contained in a long list of different types of meals, dishes, and food,
headed by the three words breakfast, lunch and dinner. Another click on these
words generates the pictures in Figures 3, 4 and 5.
Figure 3. Frühstück/breakfast
Figure 4. Mittagessen/lunch
Multimodality in language teaching CD-ROMs
Figure 5. Abendessen/dinner
None of the three photographs can be called a good choice for the lexical
meanings they have been chosen to identify. Apart from the poor picture qual-
ity the image in Figure 3 is too general. We see a typical restaurant or cafeteria
situation with one waiter and two guests, but there are no contextual clues that
the couple is having breakfast rather than tea, coffee, lunch, snack, supper, ice-cream
or anything else. The food that is consumed by the guests is not identifiable, and
the picture lacks a time reference to ‘morning’, which could easily be given in form
of a clock or people wearing pyjamas. While the image in Figure 3 is too general,
the picture in Figure 4 (a cup of coffee and three half croissants) is simply inap-
propriate to both the German as well as the Anglo-American concept of lunch.
An important issue is that food and eating habits vary considerably across dif-
ferent cultures and languages, and so there is a nearly endless variety of possible
connotations for individual learners. One connotation of Mittagessen for Austri-
ans and Germans, for instance, is that it is the main meal of the day.1 In Britain,
however, lunch would typically be a snack consisting of sandwiches, and dinner
would be the main meal of the day. In neither culture does lunch ever consist of
coffee and croissants.2 For the same reason Figure 5 is not well chosen to identify
the cross-culturally different meanings of dinner. The image depicts a three-course
meal consisting of soup, a main course of meat served with a side dish and dessert.
Again, it is worthwhile noting that the picture does not denote one thing but many,
the sum of which may connote ‘main meal of day’, which makes it ambiguous,
since this would typically be dinner in the U.K. but lunch in many Austrian and
German contexts. In all three pictures in Figures 3–5 a time indicator, such as a
clock, would have been an easy solution to the problem.
Martin Kaltenbacher
Figure 6. Couple in restaurant, man with female voice (eunuch, counter tenor, transsex-
ual?) shouting for a waiter/waitress
Figure 9. Lady inquiring about something to eat, low fat dishes, soup, fish, white wine
with two separate areas, one with smokers, one without, and the guests pointing
to the respective area in which they want to sit or by showing a smoking couple in
one image. Both pictures again reflect only the Given context and neglect the New
information, which is the essential one for the learner.
Figure 9 shows the same couple sitting at a table, the woman talking to the
attentive waiter. Out of the 20 sentences that are presented as ‘useful phrases when
going to a restaurant’, this image accompanies five. Each of these sentences has
a different meaning: I’d like something to eat. Do you have any low fat dishes? I’d
like some soup. I’d like fish. I’d like white wine. Listing all the New information in
the clause final positions, we have: something to eat, low fat dishes, soup, fish, white
wine. Similarly to Figures 7 and 8, none of this New information is encoded in the
image. The visual meaning is too general, and the picture is therefore superfluous.
It can only serve as an eye-catcher to attract the learner’s attention but does not
help her/him understand what these sentences mean.
In contrast, let us have a brief glance at Figure 10. It presents the couple at
the end of the meal. The man is talking to the waiter and pulling a plastic card
out of his suit, showing it to the waiter. From this it is easy to infer that the man
is enquiring about credit cards. The sentence supported by the picture is: Do you
take credit cards? This is one of the good multimodal combinations in the CD,
where the two meanings encoded in the verbal (spoken and written) modes and
the visual mode match, i.e. there is semiotic cohesion between the phrase credit
card and the yellow plastic card shown in the man’s hands. The picture at the same
time reiterates the Given context (scene in a restaurant) and portrays the New
Multimodality in language teaching CD-ROMs
The third type of visualisation that will be addressed here is the short video-clip
used for vocabulary and pronunciation training. It usually lasts between half a
second and two seconds and shows native speakers producing words or short
phrases in the target language. Such video-clips frequently present close-ups of the
model speaker’s upper body and/or face, and focus on gestures, hand-movement,
head-movement, eye-contact, etc. The most important feature is, however, the lip
movements.
The following are examples of this kind of visual used in the language learn-
ing CD-ROM Sag’s auf Englisch (Say it in English) by Langenscheidt Publishing
Company (1998). This CD, too, is promoted as a holiday trainer, teaching users
the basic words and phrases for tourists. Each word or phrase on the CD is accom-
panied by a very well-chosen icon supporting its meaning. These images are kept
at a very simple level, and due to their iconicity they mostly avoid the lexical am-
biguities and wide range of possible connotations that rest within more complex
Martin Kaltenbacher
Figure 11. Frame from the video-clip showing speakers at the end of uttering yes
Figure 12. Frame from the video-clip showing speakers at the end of uttering no
pictures. Two examples are the two hands with the thumbs pointing up to signify
yes and down to mean no in Figures 11 and 12. In addition to that, each word or
phrase is also accompanied by two short video-clips alternately showing a man or
a woman uttering the word/phrase. And this is where the problems start. Figures
11 and 12 each display one frame from the clips showing the two model speak-
ers uttering the complementary words yes (11) and no (12) respectively. Scrolling
through the wordlists to listen to different words and phrases, it is surprising to
Multimodality in language teaching CD-ROMs
find that all the model productions are supported by the same two video-clips. In
other words, CD contains only two video sequences, one for all the man’s utter-
ances and one for all the woman’s. This is of course a ridiculous strategy – as Bellik
(1996: 9.4) points out, “in human communication, the use of speech and gestures
is completely coordinated”. Gesture is a mode with its own functions in spoken
interaction – raising an eye-brow may express doubt or curiosity, particular hand
movements lay emphasis on certain aspects of an utterance, etc. If only one gesture
accompanies all utterances (some a single word, others extended phrases or sen-
tences), it will not help learners understand diverse meanings. In these video-clips,
the speakers are shown in a relatively static posture – standing or sitting upright
without hand movement, both gazing out of the screen, establishing eye-contact
with the learner. While the man nods his head, the woman waggles her shoulders
slightly. These are the only gestures in the videos. The learner is therefore con-
fronted with the apparent message that all English words, phrases, and sentences
are accompanied by exactly the same gestures. If you are female you wriggle your
shoulders, if you are male you nod your head, whether the verbal message is yes,
no, telephone, sorry, lady’s toilet, gent’s toilet, left, right, beer, please speak more slowly,
where can I hire a car, is it safe to swim here, cheers or any other of the hundreds of
utterances accompanied by these two gestures.
But the worst effect of this strategy is that with each speech segment the learner
sees a physiologically incorrect articulation. In Figures 11 and 12 identical lip
movements are seen for yes and no. The position of the male speaker’s lips is im-
possible for producing /n6~/, which is characterised by lip rounding. It is common
knowledge among linguists that in spoken interaction “the optical information is
complementary to the acoustic information because many of the phones that are
said to be close to each other acoustically are very distant from each other visu-
ally” (Goldschen 1996: 9.5). Language learners can be expected to focus on the lip
movements of teachers for disambiguation and as a guide for their own articu-
lations. Consequently, this kind of digital ‘teacher’ is likely to be less help than
no visual guide at all. Benoit et al. (1996: 9.6) point out that “[. . .] synthetic faces
increase the intelligibility of synthetic speech, but under the condition that facial
gestures and speech sounds are coherent.” This also holds true for visual record-
ings of real human speakers. It is a fact that people may perceive the same syllable
or word differently if accompanied by misleading visual data, which is generally
known as the ‘McGurk effect’. In a groundbreaking experiment McGurk and Mac-
Donald (1976) showed a manipulated video to their test persons, in which they
heard the syllable /bα/ while seeing a speaker utter the syllable /gα/. As a result of
this multimodal incohesion, their subjects all reported that they were perceiving
/dα/. We must therefore conclude that a video-clip with visual articulations that
Martin Kaltenbacher
. Conclusion
Over the last ten years there has been a growing demand by linguists as well as
other experts on semiotics to acknowledge equal status to the visual as to the verbal
semiotics in multimodal texts. This demand has been amply documented, e.g. in
Kress and van Leeuwen (1990, 1996, 2001), Baldry (2000), O’Halloran (2004) and
in this volume. One can easily foresee that theoretical as well as applied issues of
multimodal semiotics will continue to move into the focus of linguistics, semiotics,
communication studies and other sciences over the next years, and it is high time
that this be so.
The main advantage of new information technology and media is the relative
ease of use and the speed with which new multimodal texts can be designed and
distributed. Ease and speed, however, both increase the risk that authors fail to do
their research or to consult existing expertise in the design process for multimodal
products. Unfortunately, this seems to be so in many cases in the language teaching
CD-ROM industry. The majority of such materials are not based on a solid foun-
dation of semiotics, linguistics, language acquisition theory, or pedagogy. In fact,
they revive many approaches common in the 1950s and 60s, such as behaviourist
pattern drilling, which have otherwise long been discredited and dropped from
language teaching practice (for a fuller discussion, see e.g. Kaltenbacher 2003;
Ventola & Kaltenbacher 2003; Chapelle 1997; Holland et al. 1995). The under-
lying principle in many language learning CDs seems to be “We now have the
technology, so let’s have a go!” One evident design criterion is economics – the
software should be quick, easy, and cheap to programme. Another is entertain-
ment – visual media (e.g. sound waves, video-clips) often serve as ‘edutainment’
gimmicks without providing real help to learners. Producers mainly focus on tech-
nical software expertise and ignore the fact that there is more to producing a
complex multimodal text for teaching a language than simply putting words and
images together.
Notes
* I would like to thank Digital Publishing and Tandem Verlag for their explicit permission to
reproduce picture material from the CDs Vokabeltrainer 3 and Holiday Trainer English. Equally,
I would like to thank Langenscheidt KG for their permission to reproduce material under
Multimodality in language teaching CD-ROMs
the provisions of the Fair Use Paragraphs of international copyright law (e.g. Österreichisches
Urheberrechtsgesetz §§40, 46, US copyright law 17 USC §107).
. This has been changing over the last years, especially in urban areas, where many people have
a snack for lunch and the main meal in the evening. Families with children, however, and most
people in the country still have the main meal of the day at noon and a light supper in the
evening.
. The CD-ROM, designed for the German speaking market, was programmed in Kiev/Ukraine.
However, a Ukrainian informant assured me that coffee and croissants would never pass for
lunch in the Ukraine either.
References
Baldry, Anthony (2000). Multimodality and Multimediality in the Distant Learning Age.
Campopasso: Palladino Editore.
Bellik, Yacine (1996). “Modality integration: speech and gesture.” In R. A. Cole, J. Mariani,
H. Uszkoreit, A. Zaenenand, & V. Zue (Eds.), Survey of the State of the Art in Human
Language Technology, Ch. 9.4. Portland, OR: Center for Spoken Language Understanding.
http://cslu.cse.ogi.edu/ HLTsurvey/HLTsurvey.html
Benoit, Christian, Massaro, Dominic W., & Cohen, Michael M. (1996). “Modality integration:
Facial movement & speech synthesis.” In R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenenand,
& V. Zue (Eds.), Survey of the State of the Art in Human Language Technology, Ch. 9.6.
Portland, OR: Center for Spoken Language Understanding.
http://cslu.cse.ogi.edu/HLTsurvey/ HLTsurvey.html
Chapelle, Carol A. (1997). “Call in the year 2000: Still in search of research paradigms?”
Language Learning & Technology, 1 (1), 19–43.
Galloway, Ann (1993). “Communicative language teaching: an introduction and sample
activities.” Washington, DC: ERIC Clearinghouse on Languages and Linguistics.
http://www.ed.gov/databases/ERIC_Digests/ed357642.html
Goldschen, Alan J. (1996). “Modality integration: facial movement & speech recognition.” In
R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenenand, & V. Zue (Eds.), Survey of the State of
the Art in Human Language Technology, Ch. 9.5. Portland, OR: Center for Spoken Language
Understanding. http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey. html
Grice, Paul (1989). Studies in the Way of Words. Harvard: Harvard University Press.
Halliday, M. A. K. (1994). An Introduction to Functional Grammar. London: Edward Arnold.
Holland, Melissa V., Kaplan, Jonathan D., & Sams, Michelle R. (1995). Intelligent Language
Tutors: Theory Shaping Technology. Mahwah, NJ: Lawrence Erlbaum Associates.
Johnson, Keith (2001). An Introduction to Foreign Language Learning and Teaching. Harlow et
al.: Longman.
Kaltenbacher, Martin (2003). “Language learning via CD-Rom – old wine in new bottles.” In D.
Newby (Ed.), Mediating Between Theory and Practice in the Context of Different Learning
Cultures and Languages (pp. 171–175). Strasbourg: Council of Europe Publishing.
Kastovsky, Dieter (1982). Wortbildung und Semantik. Düsseldorf: Schwann-Bagel; Bern,
München: Francke.
Kress, Gunther & van Leeuwen, Theo (1990). Reading Images. Victoria: Deakin University Press.
Martin Kaltenbacher
Kress, Gunther & van Leeuwen, Theo (1996). Reading Images. The Grammar of Visual Design.
London and New York: Routledge.
Kress, Gunther & van Leeuwen, Theo (2001). Multimodal Discourse. The Modes and Media of
Contemporary Communication. London: Edward Arnold.
Ladefoged, Peter (1996). Elements of Acoustic Phonetics. Chicago: University of Chicago Press.
Long, Michael. H. (1996). “The role of the linguistic environment in second language
acquisition.” In W. C. Ritchie & T. K. Bhatia (Eds.), Handbook of Second Language
Acquisition (pp. 413–468). San Diego: Academic Press.
Lyons, John (1995). Linguistic Semantics. An Introduction. Cambridge: Cambridge University
Press.
McGurk, Harry & MacDonald, John (1976). “Hearing lips and seeing voices.” Nature, 264, 746–
748.
O’Halloran, Kay L. (2004). Multimodal Discourse Analysis: Systemic Functional Perspectives.
London: Continuum.
Ventola, Eija & Kaltenbacher, Martin (2003). “Lexicogrammar and language teaching materials –
a social semiotic and discourse perspective.” In J. E. Joyce (Ed.), Grammar in the Language
Classroom (pp. 158–201). Singapore: SEAMEO Regional Language Centre.
Verhofstadt, Katrien (2002). A Critical Analysis of Computer-Assisted Pronunciation Materials.
University of Ghent, unpublished dissertation.
Wahlster, Wolfgang (1996). “Text and images.” In R. A. Cole, J. Mariani, H. Uszkoreit, A.
Zaenenand, & V. Zue (Eds.), Survey of the State of the Art in Human Language Technology,
Ch. 9.3. Portland, OR: Center for Spoken Language Understanding. http://cslu.cse.ogi.edu/
HLTsurvey/HLTsurvey.html
Desktop Systems Ltd., Kiev, Ukraine (1995). Holiday Language Trainer – Englisch; Grosser CD-
ROM Sprachtrainer Englisch. Königswinter: Tandem Verlag Multimedia Line.
Digital Publishing (1999). English: Kommunikationstrainer 3+. München: Digital Publishing.
Digital Publishing (1999). English: Vokabeltrainer 3. München: Digital Publishing.
Langenscheidt KG and EuroTalk Ltd. (1998). Sag’s auf Englisch. Berlin and München: Langen-
scheidt.
Chapter 7
Markus Rheindorf
University of Vienna, Austria
. Introduction
selves with the interplay between the visual and the auditory – although, of course,
this rudimentary distinction of sensory ‘modes’ needs to be replaced with more
subtle distinctions. Second, critics and fans alike have usually and conveniently la-
belled Dirty Dancing a ‘dance film’, and as such it does indeed depend on the use
of dance as a social practice and a form of signification. Third, it is a film that is
implicated in a host of cultural forces and thus bears witness to them. Both in the
1960’s that the film purports to portray, as well as in the 1980’s that produced it, the
issue of class in American society was perhaps the most important of these. Thus,
despite or maybe even because of its cliché-ridden story, Dirty Dancing is a surpris-
ingly dense text recontextualizing complex social issues such as class and gender
in a remarkable way: its cinematography employs characteristic distributions of
modes in order to produce particular kinds of meaning.
Having said all that, it is perhaps necessary to point out the institutional con-
text of the work this chapter grew out of and draws upon. It is primarily the
representations of class in contemporary American culture – and of the social
practices oriented around them – that are the interest of an on-going project con-
ducted by a fluctuating group of graduate students and staff at the department of
American Studies at the University of Vienna. Although this work for the most
part involves a textual analysis of some kind, it should be pointed out that this is
being done within the framework of a cultural studies approach.1 This, then, was
the immediate point of departure for engaging with Dirty Dancing, but it soon
became evident that an analysis of the distribution of modes in this film (or, in-
deed, any other film) could well become crucial to the task. In order to test certain
assumptions, other instances of dance films, such as Flashdance (1983), Saturday
Night Fever (1977), West Side Story (1961), and Grease (1978), were later also in-
cluded in the project. This broadening of the scope of the study, at least in part, also
served to accommodate a growing interest in the notion of genre, various concep-
tualisations of which have haunted film studies for nearly a century. This problem
remains largely unresolved to date. The idea was – and still is – to substantiate the
vague concept of genre in film studies – the ‘dance film’ is just one example – with
the help of multimodal discourse analysis.
In the course of the analyses, the following questions concerning multimodal-
ity have been paramount: (1) Is the distribution of semiotic modes in Dirty Danc-
ing significant for the realisation of a generic structure – in this case, the ‘dance
film’?, (2) If so, how do various modes combine to realise this structure?, and (3)
Can the ‘dance film’ be topologically related to other genres? In order to provide
an interpretative framework for answering these questions, a number of methods
have been tentatively combined to cover both film theory and semiotics in the
wider sense. These include, among others, Metz’s (1974) structuralist distinction
of codes, Kress and van Leeuwen’s (1996) framework for an analysis of visual el-
The multiple modes of Dirty Dancing
ements, van Leeuwen’s (1999) account of sound and music, and Kress and van
Leuuwen’s (2001) more recent – but also more cautious – work on multimodality.
As these authors have repeatedly emphasised, meaning is always social.2 It
would seem that the meaning of dance in film is therefore equally dependent on
both the cultural context and the background knowledge of the spectator. Despite
this call for careful contextualism, it may, however, ultimately be possible to con-
tribute meaningfully to the concept of genre as such by studying the workings and
distributions of modes in genre films. With these issues on a larger scale on the
theoretical horizon, this chapter focuses on a discussion of the findings with re-
gard to Dirty Dancing and refers to issues of genre only where the analysis is likely
to have more general implications.
. Constructions of class
this use of dance as a resource for the making of social meaning, as well as on
the stigmatisation of the dancing bodies of the lower classes as being somehow
uncivilised and dirty – both in the literal and a figurative sense.
. Constructions of gender
easily have him fired – as one of them very nearly does – should he defy them or
refuse their advancements.
Having said all that, it needs to be added that Johnny is only stripped of his
masculinity in terms of gender performance when he finds himself outside the
social context of his kind of people, that is to say the entertainment staff at Keller-
man’s. With them, he is something of a leader, obviously the best dancer, and
commands considerable respect and social esteem. This is visualised or embodied
most effectively in the many dance scenes of the film, which – whether one con-
siders dance a spatial practice in the terms of de Certeau’s (1985, 1988) semiotics
of the everyday or as a mode of expression in the terms of Kress and van Leeuwen’s
(1996, 2001) theory of multimodality – show very clearly that he is given (or rather,
is enabled by his status to take) the space that belongs to him.
Through dance, more than any other mode, Dirty Dancing establishes that
Johnny’s status is contingent on the social context in which he moves, acts, or
dances. By using dance as mode of signification, the makers of the film were thus
able to translate this metaphor smoothly into spatial terms, visually illustrating
how Johnny and his partner Penny successfully use their dancing bodies to stake
out their own territory, briefly, even among members of the upper class. Equally
potent, however, are the images of their demotion when the true patriarch of
Kellerman’s – with a single flick of the hand – denies them the space they have
filled with their dancing bodies, and from which they have drawn all their power
and charm.
Following an initial reading of the film intended to reveal the film’s constructions
of class as well as gender, a much more specific analysis of the multiple modes of
the film was conducted, approaching speech, gesture, camera movement, image
composition, soundtrack and dancing as semiotic resources.4 The first questions,
then, with regard to multimodality, were: (1) Which modes are employed in the
film? (2) What is the relation between these distinct modes? (3) How do they
converge to produce specific meanings?
As mentioned before, the approach taken here draws, among other methods of
visual analysis, on the one developed by Kress and van Leeuwen (1996). However,
any analysis conducted strictly within the confines of this method faces consider-
able difficulties when confronted with the sequential, time-based texts of moving
pictures. On the other hand, methods developed specifically for the transcription
of film – such as Thibault’s (2000) frame-based multimodal transcription – tend
to be unwieldy and all but impossible to implement with an entire feature film.
Markus Rheindorf
salience
voice-over (non-diegetic)
Figure
music (non-diegetic)
music (diegetic)
Ground
time
What is needed instead are more systematic ways of accounting for the deploy-
ment and unfolding of semiotic resources in the filmic medium. The following
therefore draws on an array of other methods from the field of both semiotics and
film studies – such as peak-and-trough graphs for the purpose of highlighting the
changing deployment of specific resources or combinations of resources – in an
attempt to construct such an alternative approach.
It is, however, instrumental to be also able to depict schematically the com-
binations themselves in order to relate them to the notion of media specificity as
well as the concepts of diegetic and non-diegetic space. The first step, as it were, is a
rudimentary and idealised map of the semiotic resources of film from the perspec-
tive of a multimodal discourse analysis, given in Figure 1. This map distinguishes
between (1) the two major sensory modes of representation in film, that is visual
and auditory semiotic resources, and (2) between resources that are either specific
to film and have been conventionalised in and through their use and those which
are non-specific to the medium and whose use has not been conventionalised in
the history of film. In the context of this chapter, ‘medium’ is understood primarily
as the material and technological properties of a given form of communication, as
well as its technical possibilities and restrictions of production, distribution, and
reception (Kress & van Leeuwen 2001: 22) – and thus distinct from the modes that
have become associated with it and which Kress and van Leeuwen (2001: 22) see as
conventionalised uses of a medium.
Certain difficulties, however, arise in relation with Kress and van Leeuwen’s
(2001) wording in their assertion that media can “become” modes through repeti-
tive and conventionalised use. This would seem to imply that the medium through
this ‘abstraction’ ceases to exist, or even that it cannot exist separately. Far from
The multiple modes of Dirty Dancing
denying the interdependence of media and modes, this chapter assumes that
modes are indeed strongly associated with the media that have given rise to them.
All the same, modes can be – and are – realised in more than one specific medium.5
In addition, one can assume the existence of something like a semantic of a given
medium, the field of all its possible uses, many of which may never be actualised,
but are nevertheless possible. It is therefore neither practicable nor tenable to
equate mode with medium, nor can they be seen as developmental stages of each
other. What is at stake here is an opportunity to avoid the decade old trap of film
studies’ unhappy dependence on the notion of the ‘essence’ of film as a ‘medium’,
while actually talking about its conventional uses. By separating the bare medium
from its modes, one can speak of the properties of the medium while also saying
something about its modes as historically and culturally determined uses.
Moreover, the distinction between specific and non-specific modes proposed
here is often not a case of ‘either – or’. In fact, they are often hard to separate and
depend on other resources – systems of meaning, but not full-fledged modes –
such as editing and camera movement. The boundaries around modes, in other
words, show considerable permeability.6 Furthermore, both specific and non-
specific modes need not be stable and may change in the history of a medium,
crossing boundaries, and may also vary from culture to culture.
The map given in Figure 1 is by no means meant to be an exhaustive descrip-
tion. Also, one should think of it specifically as a map as it does not provide an
explanation to anything. Maps do not explain things. At their best, they can pro-
vide an overview, an idealised picture of things that may help one to get one’s
bearings. The grid formed by these initial distinctions can be seen in Figure 2, and
located within its limits are the various modes associated with film.
The category of linguistic resources subsumes all those forms of expression that
make use of language – and as uses of them can be found in all four areas formed
by the two basic oppositions, they need to be placed so as to have a share of all
four areas. It should also be noted that rather than use four isolated circles – for it
is, of course, possible to consider spoken and written language separate modes –
the one is intended to convey the relative continuity of the linguistic system as
a resource shared by several modes. Furthermore, with regard to other semiotic
modes, it was to be anticipated that the results of the analysis would sometimes
indicate an overlapping or blurring of boundaries, and this can thus be represented
as a move within a more general category rather than as a discontinuous jump
between specifics. To illustrate briefly the uses of linguistic modes in film, there
is: (1) the possibility to film non-specific written language, such as you would see,
for example, on a warning sign, (2) the specific use of titles or subtitles, (3) the
non-specific mode of spoken language as in dialogue (although certain forms of
Markus Rheindorf
auditory
visual
dialogue may arguably be seen as specific) and (4) the film-specific use of spoken
language, such as in narrative voice-over.
Drawing on van Leeuwen’s (1999) work on auditory modes, the model pro-
posed here also distinguishes between speech, sound and music. The latter category,
too, can include both specific and non-specific uses. In the field of non-specific
uses, for instance, one would find source music, while the most common film-
specific example is the use of surround sound as mood music. Looking at actual
instances of music in Dirty Dancing, however, one is faced with the problem of
whether or not to include lyrics as a distinct mode, and if so, whether to place
them as part of the linguistic or the musical mode. As there are strong arguments
for and against either choice, the domain of lyrics is represented as an overlap be-
tween linguistic and musical modes.7 A specific use of lyrics may occur, as it does
in Dirty Dancing, when the lyrics of a particular song are used not only to support
the action or plot (in which case they function on the same level as mood music),
but to foreshadow or replace it. There is, of course, the distinct possibility that this
may be possible only in certain genres such as the dance film or the musical.
Although sound obviously exists as a non-specific mode outside of the filmic
medium, what is usually termed sound effects is available as a mode only to film.
Distinct forms that have become established include, for instance, the whooshing
sound that guns or other weapons in film apparently make when they are drawn
The multiple modes of Dirty Dancing
and the sound of space ships passing close by the point of view of the camera
(even though sound cannot travel – and therefore in a sense does not exist – in
outer space).
The focus of the analysis, as far as the visual modes were concerned, was on the
meaningful uses of dance. These, it would appear, can be considered film-specific
when choreographed for a particular camera perspective and/or movement, such
as a top-down perspective, or when choreographies have developed for or through
a particular genre, such as the musicals. It is important to note that the simple fact
that dancing is of course edited to look ‘better’ than it would in real life (that is,
more fluent and energetic than it would from one relatively fixed point of view),
can also add to the meaning of the dancing, but would rely on the film-specific
resources of editing and camera movement.8 The dancing of dance films like Dirty
Dancing is, at least in part, specific to film or maybe even to the genre. Consider,
for instance, the dramatic movement in unison (and thus of solidarity and unity)
within a group that is represented visually and in a conventionalised way in dance
films (and many musicals) by a particular kind of dance choreography similar to
the one at the end of Dirty Dancing. In addition, though this is not undertaken
here, the relation of dancing to such modes as movement, gaze, and gesture will in
the future need to be explored as well.
For a brief illustration of how the map proposed here can be used in conjunc-
tion with a multimodal analysis of individual scenes, consider the rather conven-
tional opening scene of Dirty Dancing in its use of music and voice-over narration.
As the car moves from left to right (from the given into the new, in terms of Kress
& van Leeuwen 1996), it first opens up the diegetic space of the filmic narrative.
At the same time the music, which initially began as non-diegetic ‘Figure’ and the
focus of interest, is pushed back to the position of ‘Ground’ (that is, setting or
context), as Baby’s voice-over narration takes over as ‘Figure’. The transitions are
smooth and effective in their division of labour, masking the fractured space of
the shots and creating a continuous space-time through the process known in film
theory as suture. The changing distribution of specific and non-specific modes in
this scene is characteristic of filmic openings, and all modal conventions have been
adhered to.
As in the first sequence of the film, the makers of Dirty Dancing succeed at
blurring the boundaries between specific and non-specific uses of music in several
ways. For one, the distinction between source and surround soundtrack, or else
between diegetic and non-diegetic sound, is constantly crossed. In fact, the film
has been said to be almost as much ‘about’ music as it is about dancing.9 Indeed,
its makers are careful never to employ music simply as non-diegetic mood mu-
sic, so that even when there is surround music, the film’s mise-en-scéne always
provides a record player or radio as the diegetic source and thus anchors or con-
Markus Rheindorf
auditory
music
SFX
lyrics
visual
tains the soundtrack in the diegetic space of the film’s narrative world. Repeatedly
throughout the film, a particular track begins to play as diegetic sound (marked
as such by its quality and distance) – and, in terms of van Leeuwen’s (1999) ap-
proach, as ‘Ground’ – and then rises in volume and quality to become more than
simply ‘Figure’ or the focus of interest, it in fact replaces all other sound. In other
words, in becoming ‘Figure’ music in Dirty Dancing always also becomes ‘sur-
round’ and thus non-diegetic. As van Leeuwen (1999: 170–181) put it in his work
on the uses of soundtrack, it is one of the characteristics of sound that it is dynamic
and can move the listener towards or away from a certain position. In the case of
these tracks in Dirty Dancing, the music gradually opens up into and envelops
the non-diegetic space of the world of the spectator. In one instance, as Baby and
Johnny dance and sing along to the non-diegetic music track, the lyrics even come
to function as a substitute for dialogue.
As has already been suggested, the boundaries between gesture and dance are
prone to a different kind of blurring, so that in Dirty Dancing a single unbro-
ken movement can frequently be seen as crossing the boundaries between the two
modes. Like the gradual change of music from diegetic to non-diegetic, this has a
similar effect in anchoring the beginning of sometimes choreographed and clearly
stylised dancing in the diegetic space of the film (coinciding with a move from
The multiple modes of Dirty Dancing
non-specific to specific modes), while also opening up into the non-diegetic as the
scene unfolds.
In light of the distinct patterns that emerged during the analysis of individual
scenes, it seems possible to extend Sinclair’s (1991) assumption that language con-
tains many prefabricated items such as set phrases and collocations to other modes
besides language. Using multimodal discourse analysis, these ‘filmic phrases’ or
sequences can be shown to share common patterns and distributions of modes,10
among them the absence of linguistic information in certain contexts and in the
case of certain content. Thus, it is possible to identify scenes with voice-over nar-
ration serving to launch the narrative of a film according to their characteristic
configuration of modes as well as according to their content and function.
Moreover, film conventionally and strategically employs changing multimodal
configurations in order to increase the salience of certain scenes. These techniques
are central to a film’s ability of endowing the ordinary incidents of everyday life
with symbolic meanings. In fact, it is well-known in film studies that this is com-
monly achieved through the use of soundtrack, lighting, and camera movement –
yet, a multimodal perspective can offer new insights into the phenomenon of
salience as well as providing innovative ways of representing the choices made
in such instances of meaning-making. A particularly common example of such
a characteristic (and possibly generic) distribution of modes is what is usually
known as the ‘training or preparation sequence’. Such a sequence involves the de-
piction of a relatively long period of time in which the characters undergo some
form of development, often involving physical training or other forms of prepa-
ration for a conflict that lies ahead.11 In terms of multimodality, these scenes are
conventionally realised through the use of (1) short cuts as far as the system of
editing is concerned and (2) the use of non-diegetic music that is both ‘Field’
and ‘surround’. The significant element in this case is not only the distribution
of modes as a structure, but its change in the course of the sequence, that is the
temporal unfolding of that structure or distribution. It seems that a combination
of schematic maps and peak-and-trough graphs is the best method for visualising
the structure and its unfolding respectively. Herein lies the crucial opportunity for
multimodal discourse analysis to contribute to studies of genre, not only in film,
but in every medium that employs more than one mode.
Apart from being remarkable for its cultural implications, the music in Dirty
Dancing in some cases is even more remarkable for being typical music of the
1980’s. This phenomenon coincides with the uses of music that involve the kind
Markus Rheindorf
of move from diegetic to non-diegetic sound that has already been discussed. The
movie, though ostensibly set in the 1960’s, was of course designed and produced in
the 1980’s – as the design and colour of the title readily testify. This historical and
cultural distance between setting and spectator is bridged by a variety of means
in Dirty Dancing, including Baby’s narrative voice-over. On a different level, the
design of the opening titles serves the same function, and so do the language and
clothes of the eighties that pop up incongruently now and then in the film.
Even more prominently, however, the music of the 1980’s sometimes intrudes
upon the otherwise authentic 1960’s soundtrack of the film.12 Thus, the music
tracks opening up into non-diegetic space were in nearly all cases also recognisably
made in the 1980’s, even if the remainder of the songs featured in the film perfectly
fits the year 1963.13 Nevertheless, even in these cases of incongruence, the film
locates the source of the music as it begins to play in the diegetic space of the
film, and only then allows it to open up into the non-diegetic. It would appear,
then, that it is the move, the process of opening up as such, that is important for
the emotional appeal of such scenes, rather than the simple occupation of non-
diegetic space. In all these cases, however, this technique leads to the paradoxical
situation of 1960’s characters actually playing records that were obviously made in
the eighties.
In general, the findings regarding the use of sound in Dirty Dancing seem to
support Kress and van Leeuwen’s tentative characterisation of different modes as
having different metafunctional configurations. Vision, they posit, creates a sharp
distinction between its subject and its object, whereas sound is more immersive
(Kress & van Leeuwen 2001: 189–190). It may even be possible to extend this claim
by saying that, as a likely result of different metafunctional configurations, dif-
ferent modes are used for certain functions to varying degrees. However, these
configurations and their “affordances”, as Kress and van Leeuwen (2001) call their
metafunctional properties, are also historically and culturally contingent. Thus,
sound in Western culture could be described as less ‘suited’ to the ideational
than to the interpersonal metafunction of communication. In other words, its
ideational meaning potential would be less extensive than that of its interpersonal
metafunction (Kress & van Leeuwen 2001: 189–193).
Returning briefly to the aim of providing a characterisation of the kinds of
meaning expressed through specific modes or modal configurations, one can sum-
marise the findings by saying that it was possible to identify clear patterns con-
cerning the elements encoding class membership, as well as class conflict and its
The multiple modes of Dirty Dancing
resolution in Dirty Dancing. Of the modes available to the medium, the makers
of Dirty Dancing have used a large number to represent class membership, that is
to say at least in the visual field – remarkably, however, dialect and register do not
feature as markers of class in Dirty Dancing. In terms of both salience and narrative
development, it is the dirty dancing of the lower classes that is by far the most sig-
nificant mode of signification. In fact, as far as the diegetic world of Dirty Dancing
is concerned, their style of dancing is the principle form of showing class mem-
bership, the principal way, that is, of being of a particular class. Closely related to
this, of course, is the music that accompanies the dirty dancing, which differs from
the music accompanying the dancing of upper class characters in several respects –
perhaps most importantly in that it is exclusively African-American in the case of
the lower class, whereas it never is with upper class dancing.
A number of modes are furthermore combined to charge the body of the lower
classes with sexual meanings. Through the movements of the dirty dancers, their
dress code, as well as the music accompanying their dancing, the act of their danc-
ing is highly sexualised. This is why Baby’s first encounter with the dirty dancing
of the lower classes is often perceived as a “terpsichorean deflowering”,14 why it
prefigures her actual first sexual encounter with Johnny, and why this encounter is
fittingly initiated through dancing as a form of erotic foreplay.
. Conclusion
In its final scenes and through complex multimodal configurations, Dirty Dancing
achieves a sense of closure by resolving (read dissolving) through dancing what it
has previously constructed through dancing as a semiotic mode. In order to find
a similar sense of closure alongside the film, let us highlight a few aspects of of the
final dancing scene along the lines of a multimodal analysis as suggested in this
chapter. Since the makers of the film have staged all conflicts over boundaries of
class, as well as the boundaries themselves, as a matter of dancing styles through-
out, it becomes possible for the characters to ultimately ‘transcend’ these without
ever having had to address them explicitly.
The final song of the film, a reprise of The Time of My Life by Jennifer Warnes
and Bill Medley, takes the form of a duet. In van Leeuwen’s (1999) terms, its struc-
ture is that of initiator – reactor (male – female), and contains a movement from
segregation to unison. The song is furthermore one of the 1980’s tracks of the
film and also makes the move from diegetic to non-diegetic sound. In short, it has
nearly all the features that van Leeuwen (1999) identifies as creating emotional ap-
peal, as well as the movement characteristic of Dirty Dancing which opens up the
diegetic space to involve the time and space of the spectator.15
Markus Rheindorf
The dancing in the final sequence serves a similar function as its soundtrack.
As opposed to the various training sequences in the film, the final dance is perfectly
choreographed and highly stylised. It is furthermore edited to create the impres-
sion of a flowing perfection and effortless grace, something that Baby and Johnny
have achieved in the course of the film. In the words of producer and scriptwriter
Eleanor Bergstein, the final scenes were choreographed and edited in this way so
that the audience could “be in those steps”, because “it’s that time of your life
when you do something so incredibly better than you have ever done it ever before
[. . .] what we want to say is you will forever do it” (Dirty Dancing: special DVD
edition 2001).
As regards the dissolution of class boundaries, it is important to bear in mind
that all of this takes place in a space that has opened up to include the non-diegetic.
It is Johnny’s jump from the elevated stage and into the (diegetic) audience space
that initiates the breaching and ultimately the dissolution of class boundaries.
Johnny leaves the separate space of the stage – the space which he has always
been allowed to occupy while dancing – in order to violate the classed space of
the upper-class audience. He then invites the other dance people to join him, and
when together they manage to induce everyone in the upper class audience to join
them in their dirty dancing, they facilitate, strictly in terms of dancing, a perfectly
innocuous mingling of the classes.
Not surprisingly, the actual closure of the film is only made complete with
the formation of the heterosexual couple. Having won his victory over the classed
system of dancing, Johnny is finally reunited with Baby. It is important to note that
when he joins in to The Time of My Life by silently mouthing the male voice part –
telling Baby he has “had the time of my life” and that he “owes it all to her” – this
takes place in the non-diegetic space of surround music that is part of the intended
audience’s world (of the 1980’s) and not of the characters’ world (of the 1960’s).
Indeed, there seems to be a functional relation between film-specific and non-
specific modes and the concepts of diegetic and non-diegetic space in film theory.
As they are not, however, identical, it may prove necessary but also extremely fruit-
ful to explore in detail both the theoretical and the practical relationship between
these concepts in the future.
Notes
. Affinities to specific forms of cultural studies include, among others, the work of Stuart
Hall (1992) and Lawrence Grossberg (1997). For more detailed information on the project’s
affiliation and the web anthology associated with it, see also http://www.univie.ac.at/Anglistik/
easyrider/welcome.htm
. See, for instance, Kress and van Leeuwen (2001: 4).
The multiple modes of Dirty Dancing
. This, of course, is not to say that the medium film cannot or could not do otherwise. In fact,
doing work in cultural studies entails an opposition to the kind of essentialism that claims that
there is some inherent ‘essence’ to film which filmmakers should strive to be true to.
. Although colour is no doubt a significant resource in meaning-making as well, I have omitted
it in my discussion here, as its status as mode (or system) is still controversial.
. A good example of such transposition is the proliferation of filmic techniques in comics.
Although the two media initially developed separately, comic artists soon began to take over
certain cinematic ‘devices’ and techniques, such as the establishing shot, camera angles, or the
shot-counter-shot technique used in dialogue. For more on the influence of filmic conventions
on comics, see Roger Sabin (1993).
. Christian Metz (1974), in his version of a map of what he called the “codes” of film, depicted
specific and non-specific codes as concentric circles, with the more specific codes closer to the
centre. While this does have certain advantages, it suggests that non-specific codes always include
specific codes, something that Metz never intended.
. See van Leeuwen (1999: 67–70, 150–154) for a discussion of voice quality and other criteria
that distinguish language in song from speech.
. These, it could be argued, are really systems of meaning, like colour, rather than modes. I
would posit, however, that both editing and camera movement are well on their way of being
established as modes, or already have been.
. See the review of the movie by critic Roger Ebert, originally published by The Sun Times and
now available at http://www.suntimes.com/ebert/ebert_reviews/1987/08/248895.html
. At present, the terms ‘distribution’, ‘configuration’, ‘combinatoire’, or ’multimodal ensemble’
are still used interchangeably (and somewhat arbitrarily). However, each has certain advantages
as it emphasizes a specific aspect – the process of meaning-making, the structure, or the con-
juncture of multiple modes – which is the reason for my use of more than one if not all of the
terms in circulation.
. The training or preparation sequence is, of course, an essential ingredient in every kind of
sports film, and John Avildson’s Rocky (1976) is a classic example.
. These intrusions include Wipeout by The Surfaris, Hungry Eyes by Eric Carmen, Overload
by Zappacosta, Yes by Mary Clayton, She’s Like the Wind by Patrick Swayze, and the song that
won an Oscar for Dirty Dancing, The Time of my Life by Jennifer Warnes and Bill Medley.
. Remarkably, none of the tracks used to establish the 1960’s ‘sound’ dates from later than
1963.
. Review by Edwin Jahiel available at http://www.prairienet.org/ejahiel/dirtdanc.htm
. See van Leeuwen (1999: 75–79) for an introduction to the terms used in his analysis of music.
See also van Leeuwen (1999: 85–89) for a discussion of the musical form of the duet.
References
Butler, Judith (1990). Gender Trouble: Feminism and the Subversion of Identity. New York:
Routledge.
Markus Rheindorf
Christopher Taylor
University of Trieste, Italy
For some years now this author has collaborated with colleagues at the
universities of Pavia, Padua and Venice in research projects concerning
multimodal text analysis and the creation of multimodal corpora, e.g. Taylor
(2000). In particular the work of Thibault (2000) and Baldry (2000) has inspired
work on the harnessing of multimodal transcriptions to the task of translating
screen texts for interlingual subtitles. Specific translation (and other) strategies
are required in subtitling film scripts and the like, and the multimodal
transcription provides a scientific basis for formulating some of those strategies
which are involved in the particularly controversial process of text condensation.
A whole series of different multimodal text types have been analysed (feature
films, soap operas, advertisements, cartoons, documentaries, etc.) and this
chapter will illustrate the methodology underlying and the results obtained from
a number of those analyses.
. Introduction
The purpose of this chapter is to trace the search for useful translation strategies
for subtitlers of video material. The traditional theories relating to translation in
general are of assistance, in that subtitling is, after all, just one example of applied
translation, though the specificities attached to this genre suggest that some par-
ticular strategies or approaches might be usefully sought. The chapter will first
run briefly through the works and ideas of some of the major figures in the history
of translation studies before investigating more recent contributions with more
specific multimodal interests.
. Translation studies
From the dawn of translation theory in the first century B.C., when Cicero posited
the dichotomy of ‘sense’ versus ‘word’ (ad sensum/ad verbum) (cf. George Steiner
1975), the debate basically revolved around the following question – the spirit or
Christopher Taylor
the letter, the matter or the manner? The more authoritative a text was consid-
ered, e.g. the Bible, the more ‘literal’ the translation should be. This has led in the
past to self-doubt. Cicero himself declared “If I render word for word, the results
will sound uncouth, and if compelled by necessity I alter anything in the order of
wording, I shall seem to have departed from the function of a translator” (in Nida
1964: 13).
Centuries later, Dryden, who translated Horace, expressed his scepticism over
the feasibility of being able to translate literally and well at the same time with:
“Tis much like dancing on ropes with fetter’d legs!” (in Nida 1964: 18). Even in
the twentieth century there has been no let up in the sense/word debate. Radically
different positions have been established ranging from Walter Benjamin’s (1968)
“The sentence is a wall blocking out the language of the original, while word for
word translation is the arcade” to Koller (1972) who propounded the idea of the
equivalent-effect principle of translation, challenging the authority of the word or
structure.
Accepting that taking a polarised position was counter-productive, theorists
such as Eugene Nida (1964, 1996) made important distinctions between source-
language oriented and target-language oriented translations and produced the
concepts of formal and dynamic equivalence. His dynamic translation worked at
the surface level in making a target text understandable to new audiences. Juliane
House (1977) maintained the distinction between ‘faithful’ and ‘free’ by using the
terms ‘overt’ and ‘covert’ in the sense that a more faithful version of a source text
would be (overtly) recognisable as a translation, while a freer version would seem
more like an original text in the target language (thus covert), and pointing out
that it is the translator’s responsibility to choose where to pitch the translation
along the overt-covert axis. Peter Newmark (1982) gave us semantic and com-
municative translation, continuing to emphasise that different text types required
different approaches, again harking back to the ‘authority’ of the text. It would
seem that an authoritative text is one whose status or importance is such that “the
manner is as important as the matter”, requiring the translator “to empathise with
the writer” (Newmark 1991: 109), while texts of lesser authority can be dealt with
more freely.
Towards the end of the 19th century, when it was decided in some quarters
that language was the product of culture, an enormous leap in logic concluded
that translation was actually impossible. We have already seen how scholars such
as Walter Benjamin held the view that translation should at least be as literal as
possible. So, who knows what Benjamin would have made of the following transla-
tion (Example 1) from the film ‘Kramer versus Kramer’ (Benton 1979)? A divorced
father, played by Dustin Hoffman, is explaining to his young son how lucky he is
these days to have so many things that his father’s generation did not possess:
Multimodal text analysis and subtitling
(1) We didn’t have the Mets, but we had the Brooklyn Dodgers, we had Polo
Grounds and Ebbets Field.
The ‘translation’ of this line in the dubbed version of the film is as in (2):
(2) Non avevamo i motorini, ma avevamo i monopattini, le pistole a schizzo e
non a laser.
We didn’t have scooters, we had roller-skates, we had water pistols, not laser guns.
The justification for this departure from the original is that a literal translation
would mean little or nothing to the target audience, and the meaning, a constant
and universal parental lament, is maintained. The translation in this case is dy-
namic, covert and communicative, and the question now arises as to what extent
this approach is valid within the film medium in general, if that medium is to be
conceived of as in any way creative in the same light as literary works.
. Creative texts
In terms of the traditional debate, this translation is inadequate because the sense
is translated but the words are not. But this does not mean that a literal transla-
tion of the words is necessarily called for (la disgrazia di una periferia distesa?),
rather that the translator has not recognised ‘the distance’. What is required, Parks
would argue, is some form of ‘equivalent effect’ declaring a distance in Italian. As
far as screen translation is concerned, the concepts of both ‘declared distance’ and
‘equivalent effect’ are of interest. But the various semiotic modalities at play en-
able these effects to be implemented by more than just words. The manipulative
Christopher Taylor
use of the camera, and the inventive use of light, colour, sound, etc. can help a film
declare its distance. Think of the power relations established by the imaginative
filming techniques in Citizen Kane (Welles 1941) already in the 1940s, the first in-
novatory use of the ‘Steadicam’ by Stanley Kubrick in The Shining (Kubrick 1980),
and all the special effects now adopted by the industry. These manipulations are
then understood, ignored, accepted, praised, criticised, or rejected by the viewer.
The visual is a place where meanings are created and contested (Mirzoeff 1999: 6),
and the viewer is a decisive element in the process.
In any film of Women in Love, the scene of the entry into London would be
in the hands of the director. The ‘translation’ could then be criticised in similar
terms to the written translation – does it just show a train running through the
dismal outskirts or does it pan across a desolate urban landscape, in black and
white, accompanied by dreary music, etc.? The viewer would then construe these
‘meanings’ in the way the director intended, or in some other way. However, these
considerations are at a stage prior to the intervention of the subtitler, who is gener-
ally required to work on a video text that has already been made, and whose initial
interpretative framework is already in place.
The question of ‘equivalent effect’ deserves more attention, though it too ap-
plies to all translation genres. Many modern translation scholars have made it clear
they are dealing with ‘text’ (and ‘text in context’) rather than with ‘words’ or au-
thors, and simultaneously there has been a move from the abstract to the specific:
What are texts doing on the surface? How can their ‘effect’ be transferred? For
example, Newmark’s (1982: 39) communicative translations are those that create
entire texts that “produce an effect as close as possible to that obtained in the reader
of the original”, a concept that pre-dated Newmark (Cauer 1896; Koller 1972) and
has been oft repeated in different words by many other writers on the subject. The
achieving of equivalent effect, be it to amuse, to frighten, to persuade, or what-
ever, is not always easy but examples abound to show what can be done. Sergio
Jacquier’s ingenious solution (see Galassi 1994: 62) to a Groucho Marx line will
serve to show how all the components of a multimodal text can be used to create
‘equivalent effect’: a scene in Horse Feathers (Mcleod 1932) has Groucho desperate
to conclude a deal and seal a document. “Get me a seal!” he yells at Harpo: Harpo
obliges by bringing him an exemplar of the large sea creature known as a seal. The
audience laugh as they appreciate the play on words and at the same time see the
ridiculous result of the misunderstanding. They are also reminded of Harpo’s stu-
pidity, underlined as usual by his dumb expression and clown-like garb. Jacquier’s
Groucho makes no mention of a seal/sigillo but, consonant with his movements
and gestures, he says “Focalizziamo” (Let’s focus on this). Harpo, of course, still
arrives with the seal, the word for which in Italian is ‘foca’.
Multimodal text analysis and subtitling
. Subtitling
Subtitling indeed constitutes a relatively new text type and specifically addressed
by hardly any of the above theories, schools, etc. It is an example of intersemiotic
translation. The term is Jakobson’s (1966) and refers to the fact that the meaning
created in one modality (e.g. the visual) may be translated in another modality
(e.g. the written language) or even vice versa in this digitally manipulative age,
but it also simply means that the source and target texts consist of a number of
interacting semiotic modalities. As a sub-group of screen translation texts, it has
two essential purposes, entertainment and didactics, while containing a number
of sub-sub-groups within these two broad areas. These two areas involve largely
different audiences and approaches, as explained later, and these audiences them-
selves can be usefully categorised in terms of such factors as age, geographical
provenance (e.g. subtitling or dubbing cultures, which in Europe conform to a
basically North-South divide, though the principal reasons are financial rather
than geographical – the countries with larger populations can afford to dub), and
most importantly the level of knowledge of the source language. Furthermore, the
subject material of screen texts varies as much as in non-film genres and has a ma-
jor influence on strategies adopted – documentaries require different treatment to
cartoon films for children.
But the very specificity of subtitling as a sub-genre of translation is important
and is neatly subsumed by Gottlieb in the following way:
In the context of translation, and expressed in general and rather technical terms,
subtitling consists in the rendering in a different language of verbal messages in
Christopher Taylor
filmic media in the shape of one or more lines of written text presented on the
screen in sync with the original verbal message. (Gottlieb 2000: 14)
grammatical organisation as the target text can bear, and the various stages of
text shortening which come under the headings ‘Condensation’, ‘Decimation’ and
‘Deletion’. When a text is ‘condensed’, the semantic content remains intact while
the lexico-grammatical structures for expressing that content are reduced, for ex-
ample: Would you mind stepping this way? and Entri. The formal exhortation to
enter (Entri) uses the ‘Lei’ form of the imperative in Italian, and equates with the
polite and elaborate form of the English invitation. If, on the other hand, a text is
‘decimated’, then a part of the semantic content is sacrificed, as in (4):
(4) You will have heard on the news that all the passengers were killed.
Lo sai che tutti i passeggeri sono morti.
‘You know all the passengers are dead.’
‘Deletion’ implies a total loss of information. It would seem that these last three
strategies would be those most used by subtitlers. The other semiotic modalities
providing meaning on the screen or through the soundtrack (music, sounds, etc.)
should, it would be supposed, allow not only for some creative condensation, but
also a reasonable amount of selective decimation, and even deletion where the
meaning is carried by visual or other verbal vehicles.
Indeed, these processes can be seen at work in a BBC documentary concerning
a herd of female elephants suffering in the heat of an African drought with a crip-
pled calf to look after (Echo of the Elephants, directed by M. Colbeck 1972). The
soundtrack to one scene, out in the savannah, begins with : But feeding and drink-
ing meant abandoning Echo and the new calf. Decimating one semantic element
as being superfluous (drinking is subsumed in the generic mangiare), the Italian
subtitle becomes: Ma per mangiare devono abbandonare Eco e il piccolo. The next
line is much longer: Ella, with her ragged ears, who had taken over the leadership of
the group, turned back to Echo and Enid. Here, the subtitler has first opted for the
deletion of the description of the ‘ragged ears’ on the grounds that they have al-
ready been mentioned before during the documentary and the ears are there to be
seen, in close-up, on the screen. Then, with an exquisite piece of condensation, the
finite subordinate clause is condensed to a noun group: (Ella) il nuovo capogruppo,
. . .. While these examples tend to indicate that subtitlers consciously or subcon-
sciously search for reducing strategies, it is also true that the few statistics available
(e.g. those of Gottlieb (1994, 1997) and those resulting from investigations carried
out in Trieste) show that a large percentage of titles are examples of pure transfer.
This point will be returned to after a discussion of some of the other theoretical
contributions in the field.
Sylfest Lomheim’s (1999) rather simpler set of parameters in Figure 1 also
concentrates on the reduction/expansion axis, but interposes the rather interest-
ing and usefully labelled item of ‘translation’. Lomheim defines this category as
Christopher Taylor
Generalisation
Neutralisation
Specification
“a linguistic transfer that meets the normal requirements for equivalent transla-
tion”. This is a useful distinction in that it calls into play previous definitions of
‘condensation’, ‘expansion’, etc. For example, the Italian carta geografica is not an
expansion but simply the standard equivalent of the English map, and map is not
a reduction of carta geografica. However, la rappresentazione cartografica dell’Italia
nella Galleria Belvedere, translated as the map in the Belvedere is a case of reduction.
Lomheim also introduces the idea of ‘generalisation’, ‘specification’ and ‘neu-
tralisation’, which also usually involve some degree of reduction or expansion, but
the justification is for more than just spatial reasons. Generalising is necessary
when culture-bound terms are used, as for instance in ‘All she thinks about is going
to Eastbourne’ – ‘Pensa solo al mare’. Specification is required where there are, for
example, lexical gaps in the source language, such as in the case of the potential
ambiguity in Italian between a grand-daughter and the daughter of a brother or
sister: Vado da mia nipote – I’m going to my niece’s. Neutralisation comes into play
when the translator wishes to or needs to attenuate (censorship? good taste? pub-
lisher’s instructions?) or soften a source language utterance, e.g. ‘What’s the bloody
matter! – ‘Cosa c’è?’.
out by O’Toole (1994), Kress and van Leeuwen (1996) and others. The multimodal
transcription as devised by Thibault (2000) and Baldry (2000) provides an ideal
tool for analysing the multimodal text in its entirety and drawing the relevant
conclusions in terms of how meaning can be conveyed successfully by the vari-
ous semiotic modalities in operation, and thus how dispensable or indispensable
the verbal element is in different sets of circumstances. The multimodal transcrip-
tion consists in the dividing of a video text into individual frames of, say, one
second’s duration. The frames are then arranged in sequence, vertically or hori-
zontally, flanked by boxes arranged in columns or rows that systematically describe
the various semiotic modalities of which the film is composed. In the pioneering
article by Thibault (2000), a television advertising text for an Australian bank was
transcribed and described in this way, providing a meticulous description in suc-
cessive columns showing: (1) the time in seconds, (2) the visual frames themselves,
(3) a description of the components of the visual image portrayed in terms of the
camera position CP, the horizontal HP or vertical VP perspective, the visual focus
VF, the virtual distance of the shot D, the visually salient items VS, the colours
used CR, the coding orientation from natural to surreal CO, etc., (4) the kinesic
action of the participants, (5) the complete soundtrack (dialogue, music, sounds,
etc.) and (6) a metafunctional interpretation of how the film creates meaning as it
unfolds over time, and a breakdown of the action into phases and sub-phases on
the Gregory model (2002), within which various semiotic modalities are seen to
function together as a set before giving way to a new set of modalities, following
an identifiable ‘transition’ between the two phases. Figure 2 shows just the first
two seconds of such a multimodal transcription of the afore-mentioned nature
documentary Echo of the Elephants.
Observation of this kind of interpretation of how a multimodal text ‘makes
meaning’ led the current author to consider using it as a basis for the judicious
selection of the verbal element that needs to be maintained in utterances when
subtitling a film. If the meaning, or a part of the meaning, of a section of multi-
modal film text is carried by semiotic modalities other than the verbal (visual clues,
gesture, facial expression, dramatic music, surreal lighting effects, etc.), then a
paring down of the verbal component can be justified, facilitating the various pro-
cesses of condensation, decimation and deletion outlined above. Thus Thibault’s
multimodal transcription has been adapted for this purpose, on occasion by also
fusing the visual image and kinesic action columns into one; the amount of de-
tail provided by Thibault exceeds what is necessary for subtitling purposes. Then
another column is added containing the subtitled version of the original sound-
track. Figure 3 shows the same two frames from the ‘elephants’ text, transcribed
in this way.
Christopher Taylor
The example that will, however, be discussed here in a little more detail is
taken from an episode of the BBC television comedy series ‘Blackadder’. Humour
is notoriously difficult to translate, especially apparently British humour, as it in-
volves a large number of interweaving factors – word play, register shifts, timing,
characterisation – to name but a few. However, this aspect is important but should
not be exaggerated. What people of various cultures have in common is far greater
than what separates them (see Nida 1996). There are often more radical extremes
within cultures than between them. In this case, the same audience type in differ-
ent cultures would appreciate this text, whereas there may be social or generational
gaps within cultures which would determine very different reactions to it.
However, leaving aside the Buster Keaton era and the naughty slapstick of the
likes of Benny Hill, much humour does rely heavily on the verbal, the delivery
of the funny line. Consequently, in the subtitling of this episode of ‘Blackadder’,
great care had to be taken in the subtitling in order to put the least burden on the
spectator, who is expecting to be entertained and even to be occupied by parox-
ysms of laughter, and at the same time not miss any of the essential, and very
effective, wording.
Multimodal text analysis and subtitling
The episode in question, from the Elizabethan era series, features a highly im-
plausible plot involving Lord Blackadder and his scatter-brained assistants, the
dreadful Baldrick and Lord Percy, who have inadvertently executed the wrong man
while temporarily in charge of the royal prison. The wife of the unfortunate vic-
tim, Lady Farrow, insists on seeing her husband, who she believes is still awaiting
trial in the prison. Blackadder’s scheme to extricate himself from this situation is
to impersonate Lord Farrow at the meeting with his wife by wearing a bag over his
head. Lord Percy has the job of explaining this to the unsuspecting lady.
In the first two 1-second frames (see Figure 4), Blackadder can be seen among
his henchmen preparing to put the bag on his head. Lord Percy is nervously getting
ready to meet Lady Farrow who is waiting outside. Blending an interpersonal in-
terpretation onto this ideational description, the viewer sees the participants from
the same conspiratorial level. Indeed, the viewer has a better perspective than any
of the characters in that he/she has an unhindered view of all of them as they
are arrayed on the screen. Blackadder is recognised as the boss – he has central
position and the others occupy the margin, to use Kress and van Leeuwen’s ter-
Christopher Taylor
3 Shot 2 [silence]
CP: stationary/ HP: frontal/ VP:
median/ D: MCS/ VC: outside
the jail; Percy by the door; Lady
Farrow before him; VS: Lady
Farrow’s proud posture/ CO:
artificial set; VF: distance: close;
orientation: Percy’s gaze
towards the floor (off-screen);
Lady Farrow staring at him
Kinesic action: [Percy walking
past the door; Lady Farrow
standing proudly before him]/
Tempo: M
fro-ing between competing solutions reflects the thought processes of the transla-
tor as various options are considered, a process well illustrated by Krings’ (1987)
‘thinking aloud protocols’, where translators were invited to record their thoughts
on tape while they performed their task. But the problem still remains of what ac-
tual words to use. A literal translation into Italian would provide something like:
Avanti! Avanti! But if the interpersonal elements are to be integrated, namely the
contempt and the irritation, then a version incorporating a fairly colloquial verb
plus the second person singular intimate pronoun (expressing the superior to infe-
rior relationship), might be preferred: Sbrigati! Sbrigati!. The time taken to discuss
this first minimum utterance is an indication of how much thought is required
to translate a film for subtitles, but also shows how the multimodal transcription
enables the translator to focus his/her efforts.
The next shot, consisting of nineteen frames (beginning at no. 3 in Figure 4),
contains a little more dialogue. Percy exits from the prison cell to confront the
waiting Lady Farrow. The slight differences in light and shadow from the darker
dungeon to the relatively lit corridor are the cameraman’s contribution. His pho-
tography is literally ‘writing light’. This ability to ‘write light’ means that photogra-
phy can create meaning not only by recording reality but by manipulating reality
in the many ways adopted in filming. The ‘film noir’ was a good example of cre-
ating a jaundiced view of humanity. Intervention on the individual pixels in this
digitalised age offers infinite possibilities of meaning-making. “The cameraman’s
point of view becomes ours” (Mirzoeff 1999: 103).
To return to the text, the ideational element consists in the fraught conver-
sation in which Percy attempts to prepare the lady for the ‘changed’ appearance
of her husband. The action is slow and the speech hesitant (hence the long shot)
as Percy tries to gain time to allow Blackadder to hatch the plot. The viewer is
again on the same level as the participants, enjoying the incongruous situation.
The almost too perfect costumes maintain the pretence, and the constantly kept
positions, suitably polarised on the left and the right, enable the viewer to eas-
ily interpret the respective stances: the proud yet distraught demeanour of Lady
Farrow, who maintains her gaze steadfastly on the hapless Percy, and the embar-
rassment and totally inappropriate manner of the latter, who first attempts to avoid
any eye contact.
The position of the two characters, repeated identically in phase 4, is a cohe-
sive element, as is Percy’s constant holding or fiddling with the door handle, the
conduit between all the various scenes. The lady remains in the same position as
a Given element, in that she represents sanity and the way things should be, while
Percy’s extravagant and fanciful inventions are the New element, and what provide
the humour. Percy begins the dialogue slowly and hesitatingly, but is wrong-footed
Multimodal text analysis and subtitling
by Lady Farrow, who shows an innocent belief that all will be well. Percy begins the
difficult cover-up operation, see Example 5:
(5) Percy: Erm, sorry about the delay madam, er, as you know, you are about to
meet your husband, whom you’ll recognise on account of the fact that he has
a bag over his head.
Lady Farrow: Why, I would know my darling anywhere.
Percy: Well, yes, there are a couple of things.
Percy’s first speech would definitely seem to require some trimming, though
whether this should consist of condensation, deletion, or decimation, in Gottlieb’s
terms, is the debatable point. Many of the elements are essential, both from an
ideational and interpersonal, even textual point of view. The initial hesitant ‘erm’
is a fundamental feature of spoken language, so often ignored by scriptwriters even
though they should be striving to create texts that are ‘written to be spoken as if
not written’. Here it is used to express Percy’s embarrassment as he begins his im-
probable explanation. On the one hand, it is not particularly language specific and
can be heard quite clearly. But, on the other hand, it does function as theme for
the first clause, albeit minor theme, and the speech is slow enough for the first
element ‘sorry about the delay, madam’ to appear alone on screen. However, it is
known that the brain processes two lines appearing together quicker than two sin-
gle lines appearing separately because of the onset of perception phenomenon. So,
in the final analysis, after the usual to-ing and fro-ing, it might be better to split the
whole speech into two double liners and make the ‘erm’ the first sacrifice. The first
line of the first two-liner will of course be ‘sorry about the delay, madam’, which
should be kept as it has a framing role and also because it contains one element
that is in keeping with the period in question, namely the appellative madam. Such
examples of period talk appear from time to time during every episode, precisely
to provide the background against which the incongruous modern parlance can
take full effect. The choice of translation for ‘madam’ could be between a mod-
ern ‘signora’, ‘madama’ and a perhaps more appropriate expression, known and
used in Italy in certain circumstances, also amusing circumstances. Hence ‘scusate
il ritardo, milady’.
There is then more hesitation and the discourse marker ‘as you know’, which
can be replaced in the subtitles with the punctuation device of three dots, which
also serve the purpose of underlining the deliberate pause. The information ‘you
are about to meet your husband’ is of course central to the plot, but can survive
a little decimation to ‘ora vedrete vostro marito’, maintaining the second person
plural form of address denoting the respect required and the old fashioned style.
The second two-liner begins with a fairly straight ‘transfer’ – ‘lo riconoscerete’/
‘whom you’ll recognise’, though transforming the subordinate to a new main clause.
Christopher Taylor
What follows is a key line in the humour ‘on account of the fact that he has a bag
over his head’. This line is delivered with a nicely measured pause, made more ef-
fective by the procrastinating conjunctive ‘on account of the fact that’. This time
the procrastination can again be replaced by punctuation and by the insertion of
a simple preposition, leading to the ludicrously incongruous ‘dal sacco in testa’. So
Percy’s speech would now be subtitled as: ‘Scusate il ritardo, milady. . . ora vedrete
vostro marito’ and Lo riconoscerete. . . dal sacco in testa’. Lady Farrow, seemingly
unamazed by this news, declares ‘Why, I would know my darling anywhere’. The
initial ‘Why’ is a melodramatic theatrical addition designed to augment the hu-
mour, but the stoic pose of the lady is probably sufficient to render this aspect for
the foreign audience, and the depth of feeling present in her use of ‘my darling’
comes out in the rather breathless delivery. The proud retort is thus condensed to
‘Lo riconoscerei comunque’.
The humour continues with Percy’s next line ‘Well, yes, there are a couple of
other things’. This line has to accompany Lady Farrow’s, with the obligatory hy-
phens introducing the two parts of the dialogue. Although the two lines are not
strictly connected as an adjacency pair, they should appear simultaneously as they
form part of the same shot with the two characters on screen at the same time. In
this case, the discourse markers and minor themes ‘Well’ and ‘yes’ are instrumen-
tal in Percy’s strategy. The rest is not made explicit but the audience is party to
the plot and enjoys the collusion. Thus the subtitles could also do a little conver-
sation management ‘Sì, però’, to be followed by the more peremptory but equally
(non) revealing ‘C’è dell’altro’, a classic case of condensation, in that no mean-
ing has been deleted or decimated. Some indicative punctuation is also required
here to contrast Percy’s embarrassment with Lady Farrow’s firm intent, although
it is evident in the speech patterns. So these two lines would be: ‘– Lo riconoscerei
comunque.’ and ‘– Sì . . . però . . . C’è dell’altro.’.
The next shot is again short, consisting of only four frames in which we see
a close-up of Lady Farrow, as she bows her head and lowers her eyes in order to
deliver the carefully loaded line: ‘I am prepared for the fact that he may have lost
some weight’. This is the first of six phases in which Lady Farrow is alone on screen
and marks the beginning of an escalation in her emotive behaviour. From the quiet
determination of this scene, she becomes ever more frustrated with the ridiculous
Percy, and finally loses control in a hysterical crying fit. The scene therefore has
an important thematic role and the rapport established with the audience at this
stage is important. In spite of the fact that Lady Farrow avoids eye contact with the
audience, they are still led to empathise with her predicament. The pathetic tone
of voice and the uncalled for reasonableness of her attitude set the scene for the
absurdity of Percy’s response. As the subtitle will stand alone in the four seconds
in which we see only Lady Farrow, there is no drastic need to curtail the line, and
Multimodal text analysis and subtitling
thus the interpersonal and textual elements can be incorporated. Beginning with
‘I am prepared for’, this particular locution demonstrates the composure that the
lady manages to maintain in the circumstances and requires an equivalent effect,
perhaps ‘Sarei orientata a credere che’. The second part of the utterance needs a
conditional to match the modal may and the slight paraphrase from weight to
unit of weight, thus: ‘Sarei orientata a credere . . . che potrebbe aver perso qualche
chilo’. The fact that the translation of this line, like so many others, could lead
to heated debate on the appropriateness of one approach or another is indicative
of how volatile the translation process is. Target texts are living things that can
change form and function, style and meaning. We need all the help we can get to
‘pin it down’.
The following three frames show Lord Percy in close-up as he delivers, with
exquisite timing, one of the funniest lines in the whole series, given also the
fact that Lord Farrow has been beheaded: ‘Yes, and some height’. Percy delivers
this line while again fiddling with the door handle to his left (see above). He
still needs to keep the lady waiting and provides this markedly tactless rejoin-
der. The phonological and semantic equivalences at play in the ‘weight/height’
duo, can be partially rendered in Italian by converting to units of measurement,
‘chilo/centimetro’, which are also more acceptable stylistically. However, the key to
creating the same effect on the target audience lies very much in the timing. The
multimodal transcription can be very helpful here in ensuring that the ‘spotting’,
that is the inserting of the title at the right moment, gains maximum effect. Again,
suspension dots are useful. The whole sequence would now be subtitled as in (6):
(6) Sbrigati! Sbrigati!
Lo riconoscerete. . .
. . . dal sacco in testa.
– Lo riconoscerei comunque.
– Sì . . . però . . . C’è dell’altro.
Sì . . . e qualche centimetro.
The episode continues in this vein and the translator is called upon to use every
device available to create the equivalent effect. As there is some nuance, verbal or
Christopher Taylor
. Conclusion
In conclusion, it is clear that all the vast store of knowledge that is now known as
‘translation studies’ may be brought to bear on every aspect of translation, includ-
ing subtitling. The semantic/communicative distinction is important in assessing
approaches to different types of multimodal text, as is the tendency towards either
foreignisation or domestication. The concept of equivalence in its various guises,
from Nida (1964), Wilss (1982), House (1977/1981), etc. through the variations on
the theme of ‘equivalent effect’, to recent writers on screen translation (Gambier
1997, 1998; Galassi 1994), of course remains fundamental, but a major acknowl-
edgement is due to all those who have worked in the field of multimodal texts
and have devised various approaches to the analysis of those texts. In particular,
the ‘multimodal transcription’ has provided screen translators with an extremely
useful tool, one that has been exploited at length in Trieste in our work on sub-
titling different types of video text. In future, the advances demonstrated e.g. by
Baldry (2000) and Thibault (2000) in the field of multimodal corpora and mul-
timodal concordancing will be brought to bear on translation work. It is hoped
that we soon can move to the field of subtitling film for didactic purposes in
the promotion of minority languages, as well as hone our skills in the ‘enter-
tainment/information’ section in the constant search for some failsafe strategies
to make these texts instantly accessible to foreign audiences with the minimum
effort expended.
References
Baldry, Anthony (2000). “English in a visual society: comparative and historical dimensions in
multimodality and multimediality.” In A. Baldry (Ed.), Multimodality and Multimediality
in the distance learning age (pp. 41–90). Campobasso: Palladino Editore.
Bassnett, Susan (1980). Translation Studies. London: Routledge.
Benjamin, Walter (1968). “The task of the translator.” Delos, 2, 76–99.
Cauer, Paul (1896). Die Kunst des Übersetzens. Berlin: Weidmann.
Galassi, Gianni G. (1994). “La norma traviata.” In R. Baccolini, R. M. Bollettieri Bosinelli, &
L. Gavioli (Eds.), Il doppiaggio: trasposizioni linguistiche e culturali (pp. 61–70). Bologna:
Clueb.
Gambier, Yves (1997). Language Transfer and Audiovisual Communication: A Bibliography. Man-
chester: St. Jerome.
Gambier, Yves (1998). Translating for the Media. Manchester: St. Jerome.
Multimodal text analysis and subtitling
Klaus Kaindl
University of Vienna, Austria
. Introduction
The comic strip is a hybrid genre, whose analysis cannot be clearly assigned to any
one academic discipline. This is due, among other things, to the techniques in-
volved in designing comics, ranging from various linguistic elements such as text
in speech bubbles, narrative texts, onomatopoeia and captions, to typographic el-
ements such as typeface and type size, pictographic elements such as speedlines,
ideograms such as stars, flowers etc., and pictorial representations of persons, ob-
jects and situations.1 All these techniques play a part in conveying the meaning.
The use of various modes in one genre makes it difficult for ‘monomodal’ dis-
ciplines to deal with such texts. Instead of analysing texts as a whole, they are
segmented and become objects of study for various disciplines. For comics there
are linguistics to speak about the language, graphic arts to deal with the pictures
and communication studies to discuss the forms of publication and distribution.
When speaking of multimodal texts in this chapter, I mean texts in which var-
ious semiotic vehicles, e.g. language, image, sound, music etc., are used to convey
meaning and to create a message: “Modes are semiotic resources which allow the
Klaus Kaindl
While studies of the general area of humour are mainly concerned with jokes, the
study of comics tends to foreground other comic techniques. Jokes can generally
be described as autonomous textual entities with a similar sequential organisa-
tion (i.e. introduction, text, reaction), which are structured so as to lead up to a
punch line and are not necessarily dependent on contextual factors (cf. Attardo
1994: 296–311). Comics, in contrast, are narrative texts (whose plot need not be
comical) which contain humorous elements but whose comic effect results from
the overall narrative context. Rather than jokes, comics often work with techniques
such as verbal and non-verbal puns, parody, allusion and intertextual reference.
Their comic effect comes not from the punch line, which Kotthoff defines as a
“clash between two perspectives”, but from the “dual perspectivisation” of different
contexts (Kotthoff 1996: 250).
In the existing literature on the translation of comics, only a limited number of
the possible ways of creating comic effects have been discussed as being relevant to
translation. Most of the studies deal with word-play in comics and focus primarily
Multimodality in the translation of humour in comics
Source text: Target text: Source vs. target text: Source vs. target text:
Semiotic type Semiotic type Humour technique Language-picture relation
A Monomodal Monomodal similar changed similar changed
humour humour
B Monomodal Multimodal similar changed changed
humour
C Monomodal No humour – – – –
humour
D No humour Monomodal changed changed
humour
E Multimodal multimodal similar changed similar changed
humour humour
F Multimodal Monomodal similar changed changed
humour humour
G Multimodal No humour – – – –
humour
H No humour Multimodal changed changed
humour
on the verbal element, ignoring, for the most part, the multimodal implications.
The same applies to the pictorial depiction of situations, events, and persons. Al-
though the polysemiotic constitution of comics is mentioned time and again, it is
extremely rare to find it taken seriously in concrete analyses. If the comic element
of this genre is to be analysed in all its variety and complexity, however, the narrow
orientation of existing approaches must undoubtedly be overcome, and a broader
framework be developed.
Based on the five picture-related categories, which the translator of humour in
comics can be confronted with, and the respective degree of semiotic complexity,
we can posit various types of translation solutions. These techniques (A to H) are
presented in Table 1.
Monomodal humour refers not only to the linguistic dimension but also in-
cludes comic effects based exclusively on pictorial elements. The translation can
have an impact on the semiotic composition of the sign-play, on the type of hu-
mour employed, and on the relation between text and picture underlying the
comic effect. The choice of translation strategy is dependent on semiotic func-
tional factors as well as pragmatic factors such as the cost of making changes
to pictures, editorial policies and the intended readership. We may assume that
particular categories of sign-play are associated with certain types of translation
technique, but this hypothesis could be tested only in a more comprehensive study.
Klaus Kaindl
The present chapter therefore focuses on the systematic description and exem-
plification of the various categories of multimodal sign-play and techniques of
visual humour.
Non-verbal elements in multimodal texts not only perform the function of illus-
trating the linguistic part of the text, but also play an integral role in the constitu-
tion of the meaning, whether through interaction with the linguistic elements or
as an independent semiotic system.
If the visual elements in comics can also be put to use in creating word-play
(or sign-play) effects, there is a need for a closer analysis of the possible relations
between language and pictures. The semiotic complexity of a play on signs de-
pends on the degree of integration of the signs involved and on the roles played
by language and picture in the creation of the comic effect. An analysis reveals five
distinct types of play on words and/or signs:
1. plays on words consisting basically of linguistic signs
2. plays on words reinforced by non-verbal signs
3. plays on signs that depend on a multimodal combination
4. non-verbal plays on signs reinforced by verbal signs
5. plays on signs consisting only of non-verbal elements
As general studies on visual puns have shown, non-verbal plays on signs function
basically on the same principles as verbal puns; i.e. they are based on dichotomous
relationships, such as that between form and content, and employ the polysemy
of visual signs (cf. Lessard 1991). In order to recreate multimodal plays on signs
in translation, it is essential to recognise the relations between the verbal and non-
verbal signs and to analyse the function of the non-verbal elements. What factors
need to be considered in translating a play on signs will depend to a great ex-
tent on what roles the respective pictorial and linguistic elements play in creating
the effect.
If the picture plays only a supporting role and the play on words is still understand-
able without relating it to the picture, then this relationship can be of subordinate
importance in the translation as well. In that case, translation of humour in comics
is not very different from translation of humour in non-pictorial texts (for the
description of translation techniques for verbal puns, see Delabastita 1993: 33–39).
Multimodality in the translation of humour in comics
Plays on words reinforced by non-verbal signs are those in which the comic strip
picture has a bearing on the linguistic play on words which reinforces, but is not
integral to, its effect. An example of this type of word-play is a sequence from
Astérix le Gaulois given in Figures 1 and 2.
The Romans have taken the Druid and Astérix captive and are determined
to discover the secret of the magic potion. But instead of brewing the desired
strength-giving mixture, the Druid concocts a hair restorer. Furious, the Roman
commander demands an antidote. Astérix reacts to this in a number of idiomatic
expressions which include lexemes with the semantic indication for ‘hairy’. He
comments on the Druid’s unwillingness to brew an antidote with the French id-
ioms “il a un poil dans la main” (literally: “he has a hair in his hand”, figuratively:
“he does not feel like working”) and “il a un cheveu sur la langue aussi” (literally:
“he also has a hair on his tongue”, figuratively: “he also lisps”). These remarks are
accompanied by gestures which illustrate the literal meaning of the idioms: in the
first panel Astérix is pointing to his hand and in the second, to his tongue. In this
case, the comic effect derives from the situational context; the visual elements, i.e.
the gestures, reinforce this effect, but are not integral to the constitution of the
play on words. The German translation also employs a ‘hairy’ lexeme. This oc-
curs in the left panel and prepares the reader, to a certain extent, for the play on
words that then appears in the right panel, when Astérix says that the Druid has
“Haare auf den Zähnen” (literally: “hair on his teeth”, an idiom meaning: “he’s a
tough customer”) and adds “und manchmal auf der Zunge” (“and sometimes on
his tongue”). The latter is not a German idiom. In this way, one type of humour
(two meaningful puns with a sarcastic, ridiculing effect) is replaced by another
type of humour (one pun and one absurdity) but with a similar humorous ef-
fect of ridiculing. The German “und manchmal auf der Zunge” does not have the
double pun meaning and is therefore absurd, and contrary to Grice’s Coopera-
tive Principle, which Astérix is supposed to respect vis-à-vis the Romans, who are
thus not being taken seriously but are ridiculed (see Raskin 1985 and Attardo 1994
on the role of the four conversational maxims of Grice’s Cooperative Principle
in humour). The German version therefore preserves the polysemiotic nature of
the picture-supported word-play (translation technique E) while at the same time
changing the type of humour.
In plays on signs that depend on a semiotic combination for their effect, interplay
between verbal and non-verbal elements is integral to the comic effect. This type
of interplay is illustrated by Figures 3 and 4 from Tintin–Le trésor de Rackham
le Rouge.
Here the comic effect is dependent on the relation between the inscription in
the picture and graphic signs. Captain Haddock has bumped his head against an
advertisement pillar, an injury which is indicated graphically by stars. This cre-
ates a relation of identity with the literal – as opposed to the actual – meaning of
the linguistic text on the pillar. The text is an advertisement for a daily newspaper
and reads: “des informations qui frappent”, an idiom which actually means “news
which attracts attention”, but the literal sense of which is “news which strikes”. In
the German translation, the content of the linguistic text on the pillar has been
changed; it proclaims that advertising in the newspaper Morgenpost can be ex-
pected to bring “durchschlagenden Erfolg”. This idiom means “total success”, but
the literal sense of “durchschlagen” is “to knock through” (from “schlagen” mean-
Multimodality in the translation of humour in comics
ing “to strike”) so that, although the inscription is different, the play on signs
and the underlying relation of identity between the graphic and linguistic ele-
ments have been retained. Once again, translation technique E was used, without
changing the relationship between text and picture.
In principle, however, the underlying relationships between verbal and non-
verbal elements may be changed in translation as well. An example of this can be
found in the story Le Bouclier d’Arvergne, when Astérix and Obélix are forced to
hide from the Romans overnight in a heap of coal, given in Figures 5 and 6.
In the French original, the play on signs is created by the contrast between
the literal meaning of “passer une nuit blanche” (literally: “spend a white night”,
figuratively: “spend a sleepless night”) and the picture showing the black heap of
coal. In the German translation, the play on signs is realised through a relationship
of identity between the literal meaning of the idiom “Ich sehe da in jeder Richtung
Klaus Kaindl
The opposite case, i.e. no play on signs in the original but multimodal word-
play in the translation (translation technique H), can be found in Figures 9 and 10
from the second translation of Astérix et les Goths, dating from 1971.
Klaus Kaindl
The French original merely mentions that “an ambush was laid” during a fam-
ily dinner (“il tend un guet-apens”), whereas the translation creates an identity
relation between the picture and the literal meaning of the German idiom “in die
Pfanne hauen” (literally: “to throw somebody into the pan”, figuratively: “to inflict
a crushing defeat”).
. Comic pictures
the ground. In the American version, the action of holding the ball makes perfect
sense, since the oval ball used in American football needs to be set up in this way
for the kick. In the German version with a round ball used for soccer, whose rules
do not permit players to touch the ball, the act of holding the ball appears unmo-
tivated or to serve no other purpose than pulling it away, thus rendering the gag
much less effective. Though it is true that the monomodal (pictorial) comic effect
is maintained in German (translation technique A), we find a change in the type
of humour. Whereas the original uses slapstick-like humour, with many repeated
instances of a violation of expectable behaviour, Lucy’s behaviour in the German
version is unusual from the start; the act of holding the ball has no counterpart in
reality, which renders the comic somewhat absurd.
The creation of comic effects by visual means also relies heavily on intertextual
relations, which, according to Genette (1982), include quotations and plagia-
risms as well as textual allusions and, in particular, parodies. Following Kotthoff
(1996: 264), a parody is defined here as the imitation and functional transfor-
mation of source-text structures. The understanding of parody is culture-bound.
Apart from verbal parodies, comics often use pictorial parodies, mainly of pop-
ular images or famous paintings. For instance, in Astérix en Corse, in Figure 14,
Goscinny and Uderzo make a travesty of historical paintings of the Battle of
Austerlitz in 1805, such as those by Vernet, Gérard, and de Roehn, and of the
so-called images d’Epinal fashioned after these models.
Central to these is the rising sun, which gave Napoleon’s Grande Armée a clear
view of the enemy and thus helped it win the battle. The pictorial parody, which
transplants the events to Corsica, is complemented by a verbal allusion as well
as a phonological pun, both of which give linguistic expression to the link with
Austerlitz: The leader of the Corsican rebels is called “Osterlix”, which has practi-
cally the same pronunciation in French as Austerlitz. In addition, the guide in the
bottom right panel speaks of the sommeil d’Osterlix, which is a playful transforma-
Klaus Kaindl
Figure 15. Multimodal humour based on pictorial allusion and pun in French
Figure 17. Loss of humour due to pictorial allusion to particular political figure
Figure 18. Loss of humour due to culture-specific, different allusions to Little Red Riding
Hood
Chirac would rather have been President. In the German translation, the lack
of this background knowledge blocks the comic effect intended by this allusion
(translation technique C).
Even where the allusion is to phenomena which are shared across cultures,
like certain fairy-tales for example, translation problems can arise where there are
differences in the visual images used. This is the case in Figure 18 from the story
Coke en stock by Hergé.
When Tintin and Haddock return home, Tintin’s dog Milou greets them with
a pitiful howl. He has been dressed up in a strange costume – a red hat and a
Multimodality in the translation of humour in comics
pink cloak – by Abdallah, the son of an emir, who is friends with Tintin and
Haddock and is currently staying at their house. This costume, which Milou evi-
dently considers a disgrace, alludes to Charles Perrault’s Le Petit Chaperon Rouge
(Little Red Riding Hood), whose main character is traditionally depicted in such
garb in French books of fairy-tales. The comic effect arises from the contrast be-
tween the naïve character of Little Red Riding Hood and the clever detective dog
Milou, to whom Hergé attributes typically human behaviour (as in the present
case, where the dog is embarrassed to tears by the ridiculous costume). In the
German-speaking area, Little Red Riding Hood is usually depicted differently, i.e.
with a red cap and a coat, which makes it difficult for the reader to appreciate
the humorous contrast between the fairy-tale and the comic character (translation
technique C). This example confirms Eco’s (1987) claim that pictures of objects,
in this case clothes, will be interpreted or ‘read’ only within a functional context,
which may in turn be culture-specific.
The visual repertoire used in comics includes not only iconic signs, such as images
of characters and objects, but also graphic and typographic elements which help
create a comic effect. It is particularly in comics that typography has developed
into an autonomous medium capable of carrying various types of information. In
Astérix, for instance, specific typefaces are used to represent different nationalities
(e.g. hieroglyphics for the Egyptians, Gothic type for the Goths, etc.). Moreover,
the size and width of the letters, the directionality of the lettering (straight, curved,
or undulating), letter spacing and vertical orientation as well as the contours of
the letters are used to represent the volume, pitch and duration of utterances and
noises. Colouring, finally, can be used to symbolise additional information, e.g.
about the emotional state of the characters.
These typographic elements are used in comics both in the utterances of the
characters and in the representation of noises, through onomatopoeia. In transla-
tion, the typographic dimension and its communicative potential have been, and
in some cases continue to be, neglected. And yet, the visualisation of acoustic as-
pects of speech or onomatopoeia may well be used for comic effect, as shown, for
example, in a sequence from Astérix le Gaulois and its translation into Croatian in
Figures 19 and 20.
Asterix teases a Roman by pulling hard on his beard as he sings a nursery
song with a line about a beard (“Je te tiens par la barbichette” – “I hold you
by your beard”). While the jerky movement of the Roman’s head is represented
graphically, the resulting trembling of his voice is represented typographically by
Klaus Kaindl
Figure 20. Deletion of multimodal humour due to use of standard typography in Croatian
duplicated lettering. The latter was ignored in the Croatian translation, where
standard typography suggests a normal voice quality (translation technique C).
. Summary
Comics must be viewed as complex multimodal texts. All of the verbal and non-
verbal elements of this genre can, in principle, be used also to create humorous
and comic effects. The repertoire of means of expression in comics includes ver-
bal, pictorial, and typographic signs. In research on humour in general, and in
translation studies in particular, the focus has traditionally been on language, with
Multimodality in the translation of humour in comics
Note
. Given the large variety of types of comics, it is difficult to come up with a comprehensive def-
inition. What all comics have in common is that they are narrative forms in which the story is
told in a series of at least two separate pictures. For a general discussion of the problem of defin-
ing comics, see Groensteen (1999). For a translation-relevant anatomy of comics, see Kaindl
(1999).
References
Source texts
Binet, Christian (1989). Monsieur le Ministre. Paris: Editions Audie. German Translation 1990.
Herr Minister. Nürnberg: Alpha-comics.
Brétecher, Claire (1979). Les frustrés 4. Barcelona: Printer Industria Gráfica. German Translation
(1989). Die Frustrierten 4. Reinbek bei Hamburg: Rowohlt.
Goscinny, René & Uderzo, Albert (1961). Une aventure d’Astérix le Gaulois. Paris: Dargaud.
German Translation (1968). Astérix der Gallier. Stuttgart: Ehapa. Croatian Translation
1992. Asterix Gal. Zagreb: Izvori.
Goscinny, René & Uderzo, Albert (1963). Astérix et les Goths. Paris: Dargaud. German
Translation I (1965). “Siggi und die Ostgoten.” Lupo modern, 27–37. German Translation II
(1971). Asterix und die Goten. Stuttgart: Ehapa.
Goscinny, René & Uderzo, Albert (1967). Astérix Légionnaire. Paris: Dargaud. German
Translation (1971). Asterix als Legionär. Stuttgart: Ehapa.
Goscinny, René & Uderzo, Albert (1968). Le bouclier d’Arverne. Paris: Dargaud. German
Translation (1972). Asterix und der Avernerschild. Stuttgart: Ehapa.
Klaus Kaindl
Goscinny, René & Uderzo, Albert (1970). Astérix en Corse. Paris: Dargaud. German Translation
(1975). Asterix auf Korsika. Stuttgart: Ehapa.
Hergé (1947). Tintin – Le Trésor de Rackham le Rouge. Paris and Tournai: Castermann. German
Translation (1971). Tim und Struppi – Der Schatz Rakham des Roten. Hamburg: Carlsen.
Hergé (1958). Tintin – Coke en Stock. Paris and Tournai: Castermann. German Translation
(1970). Tim und Struppi – Kohle an Bord. Hamburg: Carlsen.
Herriman, George (1992). Krazy Kat, Vol 2. Wien: Comicforum.
Schulz, Charles M. (1981). “Die Peanuts.” Stern, 47.
Secondary texts
Attardo, Salvatore (1994). Linguistic Theories of Humor. Berlin and New York: Mouton de
Gruyter.
Delabastita, Dirk (1993). There’s a Double Tongue. An Investigation into the Translation of
Shakespeare’s Word-play, with Special Reference to ‘Hamlet’. Amsterdam and Atlanta:
Rodopi.
Eco, Umberto (1972). Einführung in die Semiotik. München: Fink.
Eco, Umberto (1987). Semiotik. Entwurf einer Theorie der Zeichen. München: Fink.
Genette, Gérard (1982). Palimpsestes: La littérature au second degré. Paris: Seuil.
Groensteen, Thierry (1999). Système de la bande dessinée. Paris: Presses Universitaires de France.
Kaindl, Klaus (1999). “Thump, Whizz, Poom: A framework for the study of comics under
translation.” Target, 11 (2), 263–288.
Kloepfer, Rolf (1976). “Komplementarität von Sprache und Bild (Am Beispiel von Comic,
Karikatur und Reklame).” Sprache im technischen Zeitalter, 57, 42–56.
Kress, Gunther & van Leeuwen, Theo (2001). Multimodal Discourse. The Modes and Media of
Contemporary Communication. London: Edward Arnold.
Kotthoff, Helga (1996). Spass verstehen. Zur Pragmatik von konversationellem Humor. Wien:
Unpublished professorial dissertation.
Lessard, Denys (1991). “Calembours et dessins d’humour.” Semiotica, 85 (1/2), 73–89.
Rabadán, Rosa (1991). Equivalencia y traducción. Leon: Universidad de Leon.
Raskin, Victor (1985). Semantic Mechanisms of Humour. Dordrecht: D. Reidel.
Wilss, Wolfram (1989). Anspielungen. Zur Manifestation von Kreativität und Routine in der
Sprachverwendung. Tübingen: Niemeyer.
Chapter 10
Multimodality in operation
Language and picture in a museum*
Museums have traditionally been places where objects are exhibited so that
visitors can come and ‘view’ them. Today, however, museums are spaces where
more complex semiotic processes take place. Exhibited objects, visualisations and
verbal texts are involved in a dynamic process whereby the visitor interprets
his/her experience of the museum. To understand such a process therefore needs
multimodal description. This chapter applies multimodal analysis to examples
from the Mozart-Wohnhaus (the Mozart Residence) Museum in Salzburg,
Austria. The main focus is on the interaction between pictures and spoken
language. It will be seen that a closer understanding of how multimodal museum
texts work can provide opportunities for museum design, to encourage an
integrated interpretative experience for visitors.
. Introduction
encouraged to engage in their own interpretations, relates to how, and how well,
modes and media work together for different audiences.
This chapter discusses multimodal texts from one contemporary museum, in
the “Mozart-Wohnhaus” (the Mozart Residence) in Salzburg, Austria. It begins
with brief introductions of systemic functional linguistic theory, recent devel-
opments in multimodal theory, and previous applications of these to semiotic
analysis of museums. The focus then turns to the Mozart Wohnhaus Museum, to
considering the global arrangement of rooms, texts, pictures, and objects, before
looking more closely at a particular multimodal integration of a Mozart family
portrait and an audio-taped recording which explains it and its context.
. Theoretical background
The systemic functional linguistic (SFL) model, based on the work of Halliday
(e.g. 1978, 1994), is particularly useful in the study of museum semiosis, due to
its orientation towards context. SFL approaches texts as (inter)actions in cultural
and situational context, by modelling them as cultural types of unfolding social
action – genre – which have features appropriate to their current social situation –
register. In other words, texts vary according to who is communicating (tenor),
about what (field) and by what means (mode). While these three register variables
may be realised by both language and other semiotic modes, the study of language
has so far been the most developed. SFL theory provides a model of how the sit-
uational variables field, tenor, and mode probabilistically activate choices in the
linguistic systems, organised by the related metafunctions of language (ideational,
interpersonal and textual), and it also provides rich and detailed tools for analysing
these choices in text (for a detailed introduction to SFL, see e.g. Halliday 1994;
Eggins 1994; Thompson 1996; Martin et al. 1997).
While there are several interpretative frameworks developed for the study of mul-
timodality which provide useful theory and tools, the approaches developed by
O’Toole (1994, 1999) and Kress and van Leeuwen (1990, 1996, 2001) will be fol-
lowed in this chapter, since they share with SFL an orientation towards context
and its link-up to semiotic metafunctions, while expanding on the SFL model to
account for other kinds of semiosis than language. O’Toole (1994, 1999) applies
Halliday’s model of the rank scale and the three metafunctions as general semi-
otic mechanisms for analysing paintings, sculpture, and architecture. He argues
Multimodality in operation
that the artist, like the writer of a text, constructs meanings by choosing options
out of the systems of ‘Representation’ (ideational metafunction), ‘Modality’ (in-
terpersonal metafunction), and ‘Composition’ (textual metafunction). The prime
concern of this ‘functional semiotic model’ is to model a visual code which the
viewer of an image shares with the artist, and to reveal the functions of particular
sign choices in visual art (O’Toole 1994: 215).
Kress and van Leeuwen (1996) also build on Halliday’s work to create a de-
scriptive framework for multimodal text. They dismiss Barthes’ (1977: 37) idea
that images either extend or elaborate the verbal element of a text, or vice versa,
stressing rather that
the visual component of a text is an independently organised and structured mes-
sage – connected with the verbal text, but in no way dependent on it: and similarly
the other way around. (Kress & van Leeuwen 1996: 17)
Kress and van Leeuwen, as well as O’Toole, also regard an image not only in aes-
thetic terms but look at the dynamic interplay between a painting (or other text)
and the social situations in which it is created and used. (Kress & van Leeuwen
1996: 17; O’Toole 1994: 216). Multimodality, as defined by Kress and van Leeuwen
(2001: 20), is the use of several semiotic modes and their combination within a
socio-cultural domain which results in a semiotic product or event. This definition
works admirably for the design and interpretation of museums.
Useful work on linguistic and multimodal analysis in museum contexts has been
done by Ravelli (1996, 1997, 1998) and Purser (2000). Ravelli (1997: 6) explores
the basics of meaning-making in museums. She stresses that meaning is not to
be equated with content, but, rather, adopts Halliday’s metafunctions to look at
all museum semiosis as built from ideational (‘content’) meanings in conjunction
with interpersonal and textual ones. She sees choices from systems such as rela-
tionship, contrast, difference, and similarity as essential to meaning-making; that is,
an object is meaningful only in relation to other (sets of) objects (Ravelli 1997: 3).
At some level, everything in a museum carries meaning, through both semiotic
systems, e.g. of language or images, and intersemiotic systems, e.g. relating an im-
age to a verbal text (Ravelli 1997: 5). As Ravelli (1997: 10) puts it, the fundamental
point is “that all meaning is constructed, and that all meaning has a specific socio-
cultural location, whether one is aware of it or not.” By selecting objects and
making choices about displaying them, e.g. according to particular orderings or
groupings, a museum attaches cultural significance to them; but meaning is also
negotiable, and may depend on the previous experiences of different interactants
(Ravelli 1997: 6).
Andrea Hofinger and Eija Ventola
While we are sometimes unaware of the meanings made around us, Ravelli
(1997: 9) also points out that we tend to first notice those meanings with which we
do not agree. This is the case in the museum example discussed by Purser (2000),
based on an informant’s response to the representation of indigenous people, in
an exhibition in Berlin about the South Pacific. While, in her view, the museum
portrays these people in biased way, a viewer who is not ‘in the know’ about the
subject matter is likely to accept the museum’s meanings as “natural, obvious and
true” (Purser 2000: 185). Purser is therefore interested in the role of museums in
the public ideological and educational arena; for example, in the representation of
one culture by another, and what kind of ‘voice’ the represented people (of the ex-
hibition) have to speak for themselves in their representation by a museum. Rather
than taking an intersemiotic approach, like Ravelli (1997), Purser investigates the
role of language as a support for visuals in the museum. Objects in a museum need
some interpretation, and verbal museum texts represent “facts the museum claims
to know”, and are also means “to frame events and people in particular ways”
(Purser 2000: 175). Like Ravelli, Purser (2000: 175) argues that choice and selec-
tion are essential for the meaning-making of a verbal museum text, which she calls
a “record of selection” from a potential of meanings. These choices imply contex-
tual motivations, consequences and ideology: “Even as we speak ‘about’ someone
else, we are representing ourselves, our values, our desires, the ways we want to
tell our story” (Purser 2000: 185). This approach to museum semiosis adopted by
Ravelli and Purser, and the social semiotic model of language and multimodality
outlined in the preceding section inform our discussion of the Mozart-Wohnhaus
Museum, in Section 3.
Salzburg is known all over the world as the birthplace of Wolfgang Amadeus
Mozart, and it is no surprise that the city both praises its famous son and cashes
in on his fame. As well as the Mozart-Wohnhaus Museum which is examined
in this chapter, Salzburg has a second Mozart museum, the Mozart-Geburtshaus
Museum, where the composer was born. In the Mozart-Wohnhaus Museum, em-
phasis is given to Mozart’s life and work during the years 1773-1780. Rather than
presenting the common image of the musician as a child prodigy and musical ge-
nius, the museum presents Mozart as a son, brother, lover, and friend, through
a set of relationships with friends and family. As well as through displays of ob-
jects and pictures of Mozart, his family and friends, this representation is realised
through recorded spoken texts. Every visitor is offered a personal audio-player
for these recordings, which are activated in particular zones of the museum by
infra-red sensors. That is, when the visitor comes to a particular display area, a
spoken text relevant to that section automatically plays. (The audio-recording also
includes examples of Mozart’s music. While we recognise the importance of the
music to the multimodal experience of the museum, an analysis of these musical
extracts will not be attempted here.)
The museum consists of seven rooms. Figure 1 displays the order of these rooms.
The first large room in the left-hand corner is called “The Dancing Master’s Hall”
which was used during Mozart’s time for parties, sitting together and playing
games. The second room introduces some of Wolfgang’s employers and support-
ers, important women in his life, and his connection to the church. The third
room, known as “The Library of Leopold Mozart”, illustrates the personality of
Wolfgang’s father. The fourth room, in the right-hand corner, is dedicated to Maria
Anna Mozart, or ‘Nannerl’, Wolfgang’s sister. The fifth room aims to convey an
atmosphere of Mozart’s family life and contains furniture which dates back to
Mozart’s time. In the sixth room the visitors can trace the musician’s numerous
travels on a wall-map. Finally, the last room offers a slide show which gives an
outline of Mozart’s career and once again refers to his relationships to family and
friends. The following analysis will focus on The Dancing Master’s Hall, the centre
of the Mozart family’s social life.
Andrea Hofinger and Eija Ventola
6
7
1 2 3 4
Figure 1. Floor plan of the Mozart-Wohnhaus Museum: (1) Dancing Master’s Hall, (2)
Employers and supporters, (3) Library of Leopold Mozart, (4) Room of Nannerl Mozart,
(5) Furniture, (6) Travels, (7) Slide show
The Dancing Master’s Hall is a big, festive room, which was used as a kind of a
family living room and as a room for receiving guests. On one side of the room
there are four glass display cases containing silhouettes, notes written by Mozart,
and small paintings. On the other side of this room, there are several of the musical
instruments which Wolfgang Amadeus Mozart played (see Figure 2 and Figure 3).
Just above the instruments one sees a very large Mozart family portrait, (the focus
of the visual analyses in Section 5.2).
If the visitor has chosen to walk through the museum by listening to the audio-
recording, s/he will find that there is a spoken text for six different infra-red zones
in The Dancing Master’s Hall, corresponding to the space in front of individual
display cases, musical instruments or pictures, each telling a story or giving details
about part of Mozart’s life. Zone 1 introduces the visitor to the room and deals
with a particular game played in Mozart’s time. Zone 2 covers musical instruments
and the large family portrait which hangs above them. Zones 3 and 4 describe
antique instruments on display. Zone 5 concentrates on Mozart’s compositions,
and Zone 6 is about friends of the Mozart family. Of these, Zone 2 (the instruments
and the family portrait) is the focus of our linguistic and multimodal analyses in
the next section, and we look particularly at how the audio-text and the picture
work together.
Multimodality in operation
4 2 3
1
5 6
Figure 2. Zones in The Dancing Master’s Hall: (1) Introduction to the room, (2) The fam-
ily portrait, (3) Instruments, (4) Instruments, (5) Compositions and dedications, (6) The
‘Haffner-family’
. Analyses
involving Mozart are ‘going to Munich’, ‘sitting for the painter’, and ‘playing the pi-
ano with his sister’. Nannerl’s actions involve ‘playing the piano’ and ‘sitting for the
portrait’. The father, Leopold Mozart, acts in ‘commissioning the portrait’ and in
‘holding the violin’, while the mother’s only action is ‘dying’. Finally, the painter’s
material actions are ‘painting the portrait’ and ’painting Leopold’.
Speaking of relational processes in museum texts, Purser (2000: 181) says that
they “are used to locate, define, classify and evaluate” objects, people, and activi-
ties. In Text 1, the six relational processes define the portrait and what it represents:
‘the portrait shows the family’, ‘the portrait of the mother is hanging on the wall’.
Another relational process offers the visitor an interpretation of the visual reality
of the picture: ‘the family portrait shows the best likeness of Mozart’. The sources
of information for the textual and visual reality are represented through one of the
two mental processes: the museum curators ‘know from a series of letters’ facts
about the production of the portrait. Nannerl is also made an information source,
confirming the likeness between Mozart and his representation in the portrait, as
the Sayer in a verbal process: “Nannerl wrote [. . .] that it showed the best likeness
of her brother”. An additional relational process identifies the painter of the picture
as unknown to us.
Towards the end of the audiotape text, an interesting multimodal shift is
achieved. The existential clause, “There are two key instruments in front of the
picture”, draws the visitor’s focus away from the picture to the instruments. It then
identifies one of the objects in a relational clause, “The one on the right is an Ital-
ian harpsichord”, but leaves the other unidentified. The multimodal experience
then rushes to a change from mental ‘viewing’ to ‘hearing’: “you will now hear a
piece from the Balletto by Bernardo Storace”. The visitor’s experience moves from
listening to the audio-text to viewing the portrait and then the harpsichord and
back to listening to a piece of music.
In summary, the audio-text which accompanies the portrait mainly lists the
major figures represented, leaving the visitor to engage with the other meanings of
the painting on her/his own. The visitor’s focus is then moved out of the picture in
a rather abrupt way by introducing the two instruments in front of the picture. The
last processes link the physical harpsichord in the museum with another modal
realisation, music.
ums; e.g. suggesting what the visitors should do mentally: “Look at. . .” (cf. Enkvist
1991). However, there are other kinds of interaction in the Mozart-Wohnhaus mu-
seum between the museum as an addresser and the visitor as audience. Looking
at the audio text, we find that some uncertainty is expressed in the information
given to the visitor. The recorded speaker uses the mood adjunct probably twice,
to express a median value of probability. In this way, the museum withdraws some
of its responsibility for the truth value of the message: “The work was probably
commissioned by Leopold Mozart”; “Leopold himself was probably not painted
until his return from Munich”.
Figure 4. Mozart family portrait in The Dancing Master’s Hall (Festschrift, Picture 4, p.
59) (© Internationale Stiftung Mozarteum (ISM))
The family life of the Mozarts plays an important role in the Mozart-Wohnhaus
museum, so the family portrait in The Dancing Master’s Hall may be regarded
as an important artefact. Indeed, the picture of Wolfgang with his family (Figure
4) embodies the central concept of this particular museum – Mozart as a family
member and as a friend – while also showing him at the piano in line with his more
familiar musical identity. The following analysis of this painting will be according
to the three semiotic metafunctions, following O’Toole (1994, 1999) and Kress and
van Leeuwen (1990, 1996, 2001).
sition in the middle with the piano. Another eye-catcher is Mozart’s right hand,
which is placed over Nannerl’s left hand. This particular feature links the two sib-
lings, but also shows Wolfgang’s relative dominance in the painting. His hands also
function as vectors to the piano which is situated right in the middle of the paint-
ing, emphasising the importance of its link with him; a link which the audio text
strengthens by drawing the visitor’s attention to the musical instruments in front
of the picture.
All four represented participants engage the viewer and create affinity by gaz-
ing directly outward and smiling (Kress & van Leeuwen’s 1996: 129). According to
Kress and van Leeuwen (1996: 143–145), the horizontal axis of a painting reflects
involvement. The museum visitor looks at the family portrait from a frontal view-
point which gives her/him the feeling of being involved in a part of Mozart’s world.
The vertical angle, in contrast, usually expresses power (Kress & van Leeuwen
1996: 146–147). As the family is seen neither from above nor below, there is
no power difference construed between the visitor and the participants of the
painting; they are on the same level. The size of the frame indicates social dis-
tance between the viewer and the represented participants (Kress & van Leeuwen
1996: 130). The family portrait is a ‘medium shot’; that is, the participants are not
portrayed full size, and this seems to bring them socially closer, making the social
distance fairly informal, but not intimate.
In sum, whereas the audio text accompanying the family portrait does not
encourage much interaction, the interpersonal analysis of the visual portrait shows
that the painter makes choices which do create engagement with the viewer.
space of the Dancing Master’s Hall, which was used for the same actions of sitting
together and playing.
The original museum text neither guides the visitor into looking at the picture nor
looking out of it. In Phases I, V, and VI, all involving mental processes, the new
text, Text 2, aims to build in guidance for the visitor’s semiotic experience, from
the picture to the physical room to the music. We propose that the origin of the
object and its maker could then provide a starting point for considering the paint-
ing. Therefore, in Phase II, information on the painting and the painter is brought
together (largely in relational clauses), while in the original it was spread here and
there. Phase III then covers the material processes of the painting’s production:
who sat as a model for the painter, and when. The representational aspects of the
portrait (relational and material processes) follow in Phase IV. (These could per-
haps be extended, to include suggestions for interpreting the visuals.) In Phase V,
the rewritten version gives an explicit cue for the shift in visual focus (existen-
tial and relational processes), while Phase VI achieves the final transition from the
visual to music.
. Conclusion
Note
* The authors thank the Internationale Stiftung Mozarteum for their permission to reproduce
the figures.
References
Angermüller, Rudolph, Schlie, Reimar, & Sertl, Otto (1996). Festschrift. Die Wiedererrichtung des
Mozart-Wohnhauses. 26. Jänner 1996. Salzburg: Internationale Stiftung Mozarteum.
Barthes, Roland (1997). Image, Music, Text. London: Fontana.
Andrea Hofinger and Eija Ventola
Angermüller, Rudolph, Schlie, Reimar, & Sertl, Otto (1996). Festschrift. Die Wiedererrichtung des
Mozart-Wohnhauses. 26. Jänner 1996. Salzburg: Internationale Stiftung Mozarteum.
Multimodality in operation
Appendix 1
Ever since different modes and media started to enlarge our textual cosmos by
combining and intertwining, multimodality has become a crucial issue in
linguistic description. However, there are few attempts to methodically categorise
multimodal “blending” in text, or in hypertext. This chapter, which is grounded
in the framework of contrastive textology, focuses on the value and role of
theories of inter-semiotic layering, particularly in regard to comparative
multilingual text and corpus analysis. I begin by addressing some preliminary
issues for the intersection of multimodality and corpus linguistic research. This is
followed by a discussion of the concept of hypertext, in contrast to text, with its
multiple possibilities for blending modalities, and the implications for theories of
multimodality. The model of inter-semiotic relationships and layering proposed
by Hoek (1995) provides a useful approach, which could fruitfully be integrated
with other contemporary perspectives (such as Kress & van Leeuwen 2001). This
model is applied here to samples from the DIALAYMED-corpus, a multilingual
genre corpus consisting of medical self-counselling texts on infectious diseases,
to illustrate the different forms of multimodality found in current (hyper)texts.
Applying the theoretical approach suggested in this chapter to practical analysis
highlights several issues which need to be elaborated in future multimodality
research, particularly with regard to text-intelligibility.
. Introduction
used for analysis. [. . .] It should be made clear, however, that corpus linguistics
is not a mindless process of automatic language description. Linguists use cor-
pora to answer questions and solve problems. Some of the most revealing insights
on language and language use have come from a blend of manual and computer
analysis. (Kennedy 1998: 2f.)
This point seems particularly important when extending corpus linguistics to em-
brace multimodality in text and hypertext. If we refer to current tools that allow
electronic analysis (see for example the software reviews by Alexa & Zuell 1999)
the tools for integrative multimodal analysis are still very limited and extremely
time-consuming. However, the development of standards such as SGML and XML
seem to pave the way to handling text as a semiotic whole, including non-verbal
elements (cf. Bateman, Delin, & Henschel in this volume). In any case, we have to
be aware that an integrative approach to text, considering the verbal and visual as
equally important, and perhaps even including a time-space-axis, entails consid-
erable change in the size of the corpora needed. It therefore seems crucial at this
point to briefly address the question of why it is important to dedicate time and
resources to multimodal aspects of (hyper)text.
There are numerous ways in which language, verbal text in written or spoken
form, may be related to static and/or moving pictures. The pictorial has obviously
played an essential role both before and after the advent of literacy. Indeed, there
are several cultures that still give prominence to the oral and pictorial modes and
do not place the same emphasis on written modes as has been established in the
modern Western societies. More to the point, the pictorial and verbal modes have
never excluded each other but have always intertwined to form a semiotic texture
in text. This fact, of course, does not spare us from questions about the precise
nature of this semiotic intertwining. In what ways may a linguistic item be con-
tiguous to, or form part of, a pictorial representation, or vice versa, to give rise to
particular forms of semiotic interaction?
I will call these forms of semiotic interaction inter-semiotic layering or imbrica-
tion,1 as they may (but do not necessarily) give prominence to either the verbal or
pictorial, and may also include a time-space axis in hypermedia environments. It
is clear that inter-semiotic layering can include all the types of signs distinguished
by Peirce and his followers (icons, indices and symbols). Within inter-semiotic lay-
ering the functional spectrum of pictorial information, which was for a long time
rather restricted to illustration and explanation, is nowadays wider and includes,
among others, all Jakobsonian functions (referential, emotive, conative, metalin-
guistic, phatic or narrative-aesthetic). Furthermore, the blending of the verbal and
pictorial can incorporate playful elements and evoke superimposed meanings to
its referent, when it exercises a symbolic function. In fact, the non-verbal elements,
particularly in hypermedia, play a significant role in the analysis of functional hier-
Eva Martha Eckkrammer
archies in texts. They can detail certain semantic aspects, reveal feelings and values
transported by the addresser, add humour, try to influence the behaviour of the
addressee, etc. In certain cases (moving) pictures can function similarly to punc-
tuation marks in writing, i.e. to indicate the beginning or the end of a paragraph
or text, create a pause in the reception, or claim additional attention (for instance
in advertisements). It should therefore be emphasised that pictorial and verbal el-
ements in texts are bound together and never develop their meaning separately,
but do so through a process of interaction between the semiotic layers. Images
tend to perform multiple functions strictly bound to their verbal surroundings,
and vice versa. However, functions organise themselves hierarchically in relation
to a dominant function which is usually bound to the genre. A hierarchical model
of functions is, at first, built on the two highest functions of language and text, viz.
the communicative and cognitive function. Subordinate to these main functions
we find a series of functions such as those related to the different elements of the
language system, i.e. the functions of syntax and morphology as discoursed-driven
elements of language which lead to complex clause structures and words and serve
textual cohesion (cf. Dressler & Eckkrammer 2001).
Interestingly enough, the issue of verbal-visual interaction has traditionally
been formulated in terms of rules and principles, based on an assumption that our
interpretation of pictures is mediated by our linguistic competence. If we reflect
on the historical and cognitive dimensions of this approach, however, the opposite
conception seems far more plausible – the production and reception of linguistic
elements appears to be strongly influenced by images. Iconicity, even if paradox-
ically refuted by prominent semioticians such as Eco, seems to be a significant
factor with regard to the naturalness of language (cf. Dressler 1999). The approach
taken here follows a text model of natural text linguistics (cf. Dressler 1989, 2000)
that includes the notion of preferences (cf. Dressler 1999), set by cognitively-based
semiotic parameters, such as iconicity, indexicality, transparency and contrast be-
tween figure and ground. These parameters affect both the text producer and the
interpreter, i.e. when choosing a way to combine the verbal and the pictorial dur-
ing text production or making meaning of a multimodal text. The described text
model draws on a cognitive, pragmatic, functional and communicative conception
of text that, as the next section will illustrate, is useful in contrastive analysis which
encompasses both text and hypertext.
different media, traditional conceptions of text are hard-pressed to account for the
full range of multi-semiotic and highly networked textual phenomena. To meet
this situation, hypertext may be usefully defined as
a concept and new means for structuring and accessing text in distance-
communication, based on software-technology which allows and suggests the
interconnection of text by means of electronic links. The elements can be indepen-
dent documents (nodes) or different sequences of one and the same document.
The links can be either internal (directed to nodes within the same hypertext) or
external (pointing to other hypertexts).
(Adapted from Storrer 2000; Engebretsen 2000)
Hypertext is not a genre, but may realise many functionally differing genres em-
bedded in numerous social practices, or discourses. Hypertext favours non-linear
structures in text and, through its computational medium, enables the use of
more than one mode and overlapping semiotic codes. Multimodality is fostered
by hypertextual environments just as much as the chunking and interconnec-
tion of texts through visible electronic relations (links). Therefore, the ‘reading’
of hypertext is not necessarily bound to a particular receptive chain, but permits
the reader to choose and create an individual path to construct a continuity of
meaning. Additionally, hypertexts may dissolve the clear-cut line between the text
producer and the reader by fostering interactive processes. The interconnectedness
and non-linearity of hypertextual constructs imply that the “comprehension and
discursive structure (. . .) is volatile to the extent that it is pragmatically, not gram-
matically, determined, and so remains outside of the normative prediction and
pattern” (Miles 2000). The reception process and its result can therefore hardly be
predicted.
It is beyond the scope of this chapter to develop a fully elaborated typology
of hypertext and hypertextual linkage, although various criteria have been pro-
posed on which to base such a classification. An important example is sequentiality
(see for instance Storrer 2000, who distinguishes between mono-, poly- and un-
sequential hypertext). There is also general agreement that the classification of
hypertextual linkage requires an integrative approach which takes a number of
variables into account, such as function, position, visual elements, etc. While there
is a widely accepted need to expand hypertextual analysis to take into account
both different semiotic levels and cognitive processes, the underlying conception
of text – so long as it is a communicative and pragmatic one (such as that fol-
lowing de Beaugrande & Dressler 1981) – still seems to provide the foundation.
The constitutive criteria of textuality, such as coherence, cohesion, situationality,
acceptability, intentionality, informativity and intertextuality may be developed,
but appear sufficiently dynamic to adapt to hypertext. A concept of text which is
grounded in theories of communicative function, semiotics, and cognition offers
Eva Martha Eckkrammer
the potential to embrace both text and hypertext. If we additionally introduce the
notion of genre as a solid frame of reference for comparison, there seems to be no
overriding obstacle to comparing corpora of text and hypertext (for more on the
problem of equation, cf. Moulthrop 1991).
Furthermore, it is obvious that current hypertexts are still strongly influenced
by traditional norms for written text, in many cases lacking even basic features
of hypertextuality. This is due to issues of intertextuality: many texts written
and designed for traditional print media are (re)published on-line without ma-
jor changes (pre-existing texts which are simply digitally distributed are known
as e-texts). Further, hypertexts are currently produced by authors whose primary
socialisation is typographical, and established paradigms are naturally not auto-
matically left behind once they write for digital media. Our previous inter-medial
research on genre conventions in traditional print and virtual environments has
clearly shown that long-established textual conventions hardly disappear at once,
but persist to an important extent in virtual environments (cf. Eckkrammer &
Eder 2000).2
To come to a conclusion about the necessity of a modification of our tradi-
tional conception of text, in the light of hypertext, I assume that a wide semioti-
cally, functionally, communicatively, and cognitively grounded conception of text
still serves the purpose, and is particularly necessary for constrastive textology.
Nevertheless, to a growing extent, it will be necessary for linguistics to integrate ap-
proaches from other disciplines (e.g. semiotics, cybernetics, cognitive sciences, and
informatics), in order to provide adequate theoretical grounding for new forms of
textual analysis.
Apart from the fact that hypertextual environments may transform the trans-
mission of images into spatially and temporally delimited occurrences, for the
purpose of contrastive textual analysis, I will take images here as integrative ele-
ments of (hyper)texts, intended to be displayed simultaneously with verbal ele-
ments, when activated by the reader. This is not to imply that a time-space axis
will not play an important role in future analysis of hypertext. Indeed, as recently
suggested by Miles (2000), who adopts schematic categories established by the
movie semiotician Metz (1968) for hypertext syntagmas, a major challenge of hy-
pertextual analyses will be to theoretically accommodate spatial or topographic
and time-specific aspects of hypertext. However, frequently in current hypertexts,
the elements linked to the time-space axis are seen to be largely related to tech-
Drawing on theories of inter-semiotic layering
nical constraints. On this basis, let us now turn to the model of inter-semiotic
relationships, suggested by Hoek (1995) in the context of rhetorics.
This large variety of tasks can be achieved through a range of verbal and pictorial
means. A look at selected sample texts illustrates the wide array of possibilities and
also prompts the question of whether the intertwining of different modes furthers
or inhibits these purposes. At this point it is important to note that the obser-
vations here will be limited to description of the (hyper)texts as ‘products’, since
Drawing on theories of inter-semiotic layering
observing receptive chains and outcomes of the reading process requires empirical
analysis of real-world reading situations which is beyond the scope of this chapter.
Turning first to sample texts from the synchronic section of the DIALAYMED-
corpus, it is relevant to note that medical self-counselling texts have included
images since the advent of the genre, in the late 15th century. During the Mid-
dle Ages, when literacy was limited to very few people, pictorial information was
of enormous value. In these early texts, one finds inter-semiotic layering of trans-
medial, multimedial, and blending types, similar to those in modern mixed and
syncretic texts. The following figures illustrate the array of inter-semiotic layering
in very early texts from the Spanish6 section of the corpus.
Turning now to contemporary print and hypertext samples, it appears that
inter-semiotic layering is strongly bound to the macrostructure of the (hy-
per)textual construct and also restricted to a limited number of texts. The verbal
clearly dominates in both the traditional print and hypertext samples. At the same
time, it is far from constituting an obligatory element in the genre. Functionally,
the images predominantly show the relevant body parts, the ways in which the
virus or bacteria is passed on, effects of the disease, test procedures and thera-
peutic measures taken by the doctor. In the vast majority of the texts analysed, the
function of the images is purely illustrative, referring to something explicitly stated
Eva Martha Eckkrammer
in the verbal text. Figures 3 and 4 from our data demonstrate this juxtaposing type
of inter-semiotic layering.
What is of cross-linguistic and diachronic interest is the fact that only one
of the 45 contemporary Spanish hypertexts in the corpus includes visual images.
The making of meaning in multimodal text and hypertext is bound to many
variables, particularly the underlying discourse, genre conventions, and cultural
Drawing on theories of inter-semiotic layering
Analysis of the DIALAYMED corpus has shown that (hyper)texts are often still
much less multimodal than one would expect. This is particularly surprising in
the case of the medical self-counselling genre, given its highly “mingled nature”
and the “complex role” it must play in order to meet its objectives (cf. Al-Sharief
1996: 10). Reasons for this fact seem predominantly bound to costs: it is still more
expensive to produce a multi-coloured brochure with well designed illustrations
and/or other multimodal elements than a traditional black on white pamphlet
limited to printed language. This difference does not, on the face of it, apply to hy-
pertextual environments, so it may be surprising that – apart from inter-semiotic
blending with pictorial elements – the degree of multimodality is not yet higher
there than in print media. An important factor, however, is that technical devices
and software for producing, accessing and retrieving medical information from,
for example, video sequences or animated illustrations are not yet accessible to
the vast majority. Moreover, the internet itself only represents a real option for
information retrieval in the first world (cf. Global Internet Statistics 1997).
Nonetheless, multimodality research will have an important role to play in
application to (hyper)texts such as these, particularly in the development of a dy-
namic and integrative theory of text intelligibility. This is because multimodality
potentially impacts on all the dimensions which influence the comprehension and
retention of text. I will proceed here on the basis of the four text intelligibility
factors proposed by Schulz von Thun (1974) – (A) simplicity, (B) brevity, (C)
stimulation and (D) textual organisation.
The first factor, simplicity, should not be seen only in terms of terminological
and morpho-syntactic effortlessness (cf. Hohgräwe 1987). In some cases, a pic-
ture can certainly clarify a text, but may also make it more complex – for instance,
if it is itself highly visually complex, incoherent in context, or included without
cohesive co-text. In the case of brevity, a picture can definitely “paint a thousand
words” and it is therefore possible to transpose a lot of verbal information into the
pictorial mode, producing a more concise text. For example, in the one contem-
porary Spanish text which includes a pictorial element (photographs of syphilis
symptoms), transposing this into language would certainly extend and complicate
the verbal element. However, it should be remembered that the length of con-
Drawing on theories of inter-semiotic layering
temporary medical texts also reflects the enormous increase in medical knowledge
during the past 100 years, and the relationship between length and clarity is not
necessarily straightforward. The art of packing a lot of information into a text
without scaring off the reader needs to take into account findings on the usability
of hypertexts (cf. Nielsen 1994, 1997, 1999). In virtual environments readers lit-
erally scan through texts rather than read them as they would do with a printed
version (cf. Krebs 1991 on content retention). This might imply the need for dif-
ferent strategies of inter-semiotic layering in medical self-counselling hypertexts,
because the text needs to be even more concise and/or modularised than in tra-
ditional environments. Additionally, it is important to remember that the reader’s
expectations about what is appropriate to a genre (including its length) are highly
relevant to the receptive process and may determine whether a text is read at all or
immediately closed.
While multimodality research is needed to help illuminate the contribution
of brevity and conciseness to making (hyper)text more understandable, there is
an even clearer role for multimodality in the factor of stimulation. As observed
in the DIALAYMED corpus, some stimulation can be achieved in the verbal
mode through strategies such as direct producer-reader interaction (e.g. antici-
pated question-answer structures, cf. Gläser 1996; direct forms of instruction,
etc.). However, well-designed multimodal textual elements certainly increase the
reader’s interest and curiosity, and appropriate inter-semiotic layering with picto-
rial elements can enhance stimulation and thereby foster text comprehension.
Finally, in textual organisation, which is the fourth of Schulz von Thun’s
factors for text intelligibility, multimodal elements are potentially seen to have
important structural impact. Pictorial elements often guide the cognitive process
during text reception in a number of ways. Medical self-counselling hypertexts,
which draw on images for structure, are able to increase modularisation (see e.g.
http://www.who.int/gtb/). Verbal elements can be broken into smaller portions
with individual headings, a practice that boosts both redundancy between para-
graphs and textual dynamism, two aspects linked to the success of popularising
genres (cf. Pörksen 1986). Multimodal research into these kinds of potential could
again support (hyper)text design for a better rate of information retention.
. Conclusion
medical applications as well as more general (hyper)text design. There are also
opportunities here for developing theories of multimodality and inter-semiotic
layering to draw on research in areas such as text intelligibility (e.g. extending the
findings of Schulz von Thun 1974), discourse comprehension (such as van Dijk &
Kintsch 1983), web usability (e.g. Nielsen 1994) and more generally in the areas of
corpus linguistics and contrastive textology.
Notes
. This term is applied by Hoek (1995) in his French text. It originally denotes the form and way
that roofing tiles are laid and passed on to a variety of metaphorical usages. In English the term
belongs to geological terminology referring to the formation of an imbricate (layered) structure
of sediments (shingle structure). The use may also be metaphorical (e.g. in politics).
. Sometimes this continuity even leads to hybrid and, in our view, highly unacceptable texts
in which the writer appears to be trapped between the norms of ‘paper’ genres and those of
digital/hypertext genres.
. The term syncretic, which is traditionally applied to refer to the combination of different
forms of belief or religious practice, was adopted by contemporary semioticians to denote the
fusion of two or more sign systems within one human activity, i.e. communication.
. In Table 1, discourse is not applied in Foucault’s sense (communicative practices which are
distinctive), but, the way I understand the French original, it means text as communicative
occurrence with a specific function.
. The other two kinds of translation considered by Jakobson are intralinguistic translation,
within one language (e.g. between English synonyms), and interlinguistic (and thus intrasemi-
otic) translation (e.g. substituting a French word for an English one).
. Spanish is important in this period, as the Arabs of the Iberian Peninsula fostered the spread
of ancient Greek and Roman medical knowledge (Hippocrates, Galen) and Arab advances
(Avicenna, Rhazes).
References
Alexa, Melina & Zuell, Cornelia (1999). A Review of Software for Text Analysis. Mannheim:
ZUMA (=GESIS Nachrichten Spezialband 5).
Al-Sharief, Sultan (1996). Interaction in Written Discourse. The Choices of Mood, Reference,
and Modality in Medical Leaflets. University of Liverpool, unpublished doctoral
dissertation.
Archivo Digital de Manuscritos y Textos Españoles (ADMYTE) I & II (1992). Madrid: Ministerio
de Educación y Cultura & Micronet.
Bateman, John, Delin, Judy, & Henschel, Renate (this volume). “Multimodality and empiricism:
preparing for a corpus-based approach to the study of multimodal meaning-making.”
de Beaugrande, Robert-Alain & Dressler, Wolfgang U. (1981). Einführung in die Textlinguistik.
Tübingen: Niemeyer.
Drawing on theories of inter-semiotic layering
van Dijk, Teun A. & Kintsch, Walter (1983). Strategies of Discourse Comprehension. Orlando:
Academic Press.
Dressler, Wolfgang U. (1989). Semiotische Parameter einer textlinguistischen Natürlichkeits-
theorie. Wien: Österreichische Akademie der Wissenschaften.
Dressler, Wolfgang U. (1999). “Semiotic preference structures in language.” In G. F. Carr, W.
Harbert, & L. Zhang (Eds.), Interdigitations, Essays for Irmengard Rauch (pp. 479–489).
New York: Peter Lang.
Dressler, Wolfgang U. (2000). “Textlinguistik und Semiotik.” In K. Brinker, G. Antos, W.
Heinemann, & S. F. Sager (Eds.), Text- und Gesprächslinguistik. Linguistics of Text and
Conversation. Ein internationales Handbuch zeitgenössischer Forschung. An International
Handbook of Contemporary Research, Vol. 1 (pp. 762–772). Berlin: de Gruyter.
Dressler, Wolfgang U. & Eckkrammer, Eva M. (2001). “Functional explanation in contrastive
textology.” Logos & Language, 2 (1), 25–43.
Eckkrammer, Eva M. & Eder, Hildegund M. (2000). (Cyber)Diskurs zwischen Konvention und
Revolution. Eine multilinguale textlinguistische Analyse von Gebrauchstextsorten im realen
und virtuellen Raum. Frankfurt am Main: Peter Lang.
Engebretsen, Martin (2000). “Hypernews and coherence.” Journal of Digital Information, 1 (7).
http://jodi.ecs.soton.ac.uk/Articles/v01/i07/Engebretsen/. [19.12.2000]
Gläser, Rosemarie (1996). “Der implizite Dialog in populärwissenschaftlichen Texten
im Deutschen und Englischen.” In G. Budin (Ed.), Multilingualism in Specialist
Communication. Proceedings of the 10th European LSP Symposium Vienna, 29 Aug.–1 Sept.,
1995, Vol. I (pp. 751–770). Wien: Termnet.
Global Internet Statistics (1997 etc.). http://www.euro mktg.com/globstats
Günzburg, L. (1873). Rathgeber für Brustschwache mit tuberkulöser Anlage, beginnender
und ausgebildeter Lungenschwindsucht. Nach dem heutigen Standpunkte der Wissenschaft
gemeinfasslich dargestellt. Zweite, gänzlich umgearbeitete und vermehrte Auflage. Mit 13 in
den Text gedruckten Holzschnitten. Wien – Pest – Leipzig: U. Hartleben’s Verlag.
Herrera, Maria Teresa (1990). Johannes de Ketham, Compendio de la humana salud. Madrid: Arco
Libros.
Hoek, Leo H. (1995). “La transposition intersémiotique pour une classification pragmatique”.
In L. H. Hoek & K. Meerhoff (Eds.), Rhétorique et Image Textes en hommage à Á. Kibédi
Varga (pp. 65–80). Amsterdam: Rodopi.
Hohgräwe, Uwe (1987). Verständlichkeit von Instruktionstexten und das Informationsverhalten
von Arzneimittelverbrauchern. Wuppertal: Fachbereich Gesellschaftswissenschaften der
Bergischen Universität – Gesamthochschule Wuppertal (= Wuppertaler sozialwissenschaft-
liche Studien).
Kennedy, Graeme (1998). An Introduction to Corpus Linguistics. London: Longman.
Krebs, Marlies (1991). Lesen oder “Navigieren im Hyperraum”?: eine empirische Studie zur
Verständlichkeit von Hypertext im Vergleich zu linearem Text. University of Vienna, M.A.
thesis.
Kress, Gunter & van Leeuwen, Theo (2001). Multimodal Discourse. The Modes and Media of
Contemporary Communication. London: Arnold; New York: Oxford University Press.
Metz, Christian (1968). Essais sur la signification au cinéma /I/. Paris: Klincksieck.
Miles, Adrian (2000). “Hypertext syntagmas: cinematic narration with links.” Journal of Digital
Information, 1 (7). http://jodi.ecs.soton.ac.uk/Articles/v01/i07/Miles
Eva Martha Eckkrammer
Moulthrop, Stuart (1991). “Beyond the Electronic Book: A Critique of Hypertext Rhetoric.” In
The Association of Computing Machinery (Ed.), Proceedings of the Third ACM Conference
on Hypertext (pp. 291–298). San Antonio, TX: ACM.
Nielsen, Jakob (1994). Usability Engineereing. Cambridge, MA: AP Professional.
Nielsen, Jakob (1997). Be Succinct! (Writing for the Web). http://www.useit.com/alertbox/
9703b.html
Nielsen, Jakob (1999). Designing Web Usability: The Practice of Simplicity. Indianapolis: New
Rides Publishing.
Pörksen, Uwe (1986). Deutsche Naturwissenschaftssprachen. Historische und kritische Studien.
Tübingen: Gunter Narr.
Schulz von Thun, Friedemann (1974). “Verständlichkeit von Informationstexten: Messung,
Verbesserung und Validierung.” Zeitschrift für Sozialpsychologie, 5, 124–132.
Storrer, Angelika (2000). “Was ist “hyper” am Hypertext?” In W. Kallmeyer (Ed.), Sprache und
Neue Medien (pp. 222–249). Berlin and New York: de Gruyter.
Chapter 12
Kristin Bührig
University of Hamburg, Germany
. Interpreting in hospitals
the illustrations, without using them again during the briefing. In the following
section, an example of the first kind of diagram use will be looked at more closely.
The discussion and the analysis which follow are from a briefing for a Portuguese
patient’s informed consent to an operation to examine her bile ducts and pan-
creas. The patient is about 55-years-old and has lived in Germany for many
years. Although the hospital staff has defined her German as good, all impor-
tant discussions are interpreted. In the present discussion, the patient’s daughter is
interpreting. She is about 17-years-old and was born and brought up in Germany.
The patient was taken to hospital with a stomach-ache. She had previously
undergone an operation in another hospital to remove her gallstones. Before this
conversation with the doctor her bile ducts were x-rayed. The new examination
is necessary from the doctor’s point of view, because he suspects that a gallstone
in the patient has moved through the bile ducts to the papilla and is causing a
blockage of gall and digestion juices there, and has resulted in an inflammation of
the bile ducts and the pancreas. If this suspicion is confirmed by the examination,
the doctor plans to remove the gall-bladder. The examination, which is the subject
of the present discussion, is therefore not necessarily the last invasive operation to
be carried out on the patient in this hospital.
The doctor’s explanation of the suspected diagnosis is presented in (1).
(1) Also es geht darum, dass • • wir ja gefunden haben, dass in der Gallenblase
Steine sind. ((1second pause)) Ja? • • Und wir vermuten, dass einer von diesen
Steinen in die Gallenwege abgerutscht ist und eine akute Entzündung in den
Gallenwegen und in der Bauchspeicheldrüse gemacht hat.
So, this is about the • • fact that we have found that there are stones in the gall-
bladder. ((1 second pause)) Okay? • • And we suspect that one of these stones
has slid down into the bile ducts and • has caused an acute inflammation in
the bile ducts and in the pancreas.
In response to the patient’s query “Und wieso kommt lange Schmerz?” (And why
is there so much pain?) the doctor first gives some information about the organs
affected by the illness, and then explains the suspected diagnosis again. To begin
with, we shall look at the first step in imparting medical knowledge (i.e. introduc-
ing the organs), the use of the diagram, and the interpretation into Portuguese.
On the multimodality of interpreting in medical briefings
Before the doctor brings his knowledge into the discourse, he first prepares the use
of the diagram, in that he asks the patient to look at it, and justifies its use (2).
(2) Ich kann Ihnen das mal zeigen. Sie können ja hier mit gucken. Dass Sie das
gleich verstehen.
I can show it to you. You can look at this here with me. So that you understand
it straight away.
In the comments which follow, “Das ist die Leber” (That is the liver) and “Das ist
die Gallenblase” (That is the gall-bladder), the doctor orients the patient’s atten-
tion to the visually illustrated organs involved in her illness, through the deictic
“das” (that) (for further discussion of such deictics, see Ehlich 1979, 1982, 1983).
As a starting point for building knowledge, the doctor thus uses the patient’s vi-
sual perception, directing her focus to the organs which he then gives names. Both
statements, therefore, show the form “das” (deictic reference), “ist” (is) (copula),
“die Leber” (the liver) / “die” Gallenblase” (the gall-bladder) (definite article +
noun). Ehlich (1994b) defines this structure, in conjunction with an image, as
an ‘ostensive definition’. With the introduction of new objects to a hearer’s con-
sciousness, ostensive definitions create a link between a word and a graphically
represented element of reality. The deixis realised by “das” is purely for the ori-
entation of the hearer to the object (ostension), to link it to an element in the
symbol field2 of language, i.e. the noun (cf. Ehlich 1991; Rehbein 1998). Such os-
tensive definitions can, in my view, be understood to initiate processes of building
knowledge, as they allow the identification of elements about which the hearer has
no prior knowledge. They form the perceptual basis of knowledge verbalised by
a speaker, and so guarantee its sensory certainty through the orientation towards
the diagram. Knowledge established in this way can then be enriched with further
knowledge elements. This is exactly what happens in further discussion between
the doctor and the patient. In a subsequent step, again using a deictic expression,
“da” (there), the doctor first indicates the gallstones, then mentions the function
of the gall-bladder and the transport of the bile into the intestines (3).
(3) Da sitzen die Steine drin, nech? Und • das is • ein Speicher für die Galle. Und
de/ • • • die Galle wird • über den Gallengang • • in den Darm abgegeben.
The stones are sitting in there, aren’t they?. And • that’s • where the bile is
stored. And the/ • • • the bile is • passed via the bile ducts • • into the intestines.
In addition, with the use of the symbol field expression “Speicher” (storage place),
the prepositional phrase “über den Gallengang” (via the bile ducts), and the pas-
sive, the doctor verbalises the functional interrelation in the digestion process
between the organs shown and named (the gall-bladder and the bile ducts).
Kristin Bührig
The next step is to compare these linguistic actions of the doctor with the ac-
tions of the interpreter in the target language, Portuguese. Table 1 is a transcript of
the relevant part of the discussion, organised by speaker. The doctor’s utterances
are grouped into the column labelled “A”, the utterances of the patient’s daughter,
who is interpreting, are headed “D” and the patient’s utterances are in the column
labelled “P”. An English gloss is given in italics beneath each utterance. Text in
double parentheses shows the length of pauses in seconds, e.g. “((1.2 s))”. A dot
indicates a pause of a quarter of a second, two dots a half second, e.g. “• aren’t
they?”. Bold text indicates intonationally emphasised parts of an utterance, e.g.
“Auf dem Bild”.
The first contrast that is apparent between the doctor’s utterances and those
of the interpreter is that the daughter does not convey the request that the pa-
tient should direct her attention to the information sheet or the diagram in the
target language. She begins her interpreting in Segment 36, with the doctor’s state-
ment about the transport of the bile into the intestines (Segment 34). However, she
does not finish this statement, but instead first indicates – probably visually – the
gall-bladder to the patient: “Äh • aqui é o coisu, ja?” (Ah • here’s the thing, right?)
(Segment 37).
Let us look more closely at the inclusion of the diagram in the interpreter’s
Portuguese utterances. Like the doctor, she uses a deictic expression, “aqui” (here).
However, “aqui” refers only to location, and not, like the German “das” (that),
to an object in the perceptible space. Perhaps the interpreter uses “aqui” to direct
the patient’s attention back to the place in the diagram to which the doctor had
referred in his German explanation. That is, it is conceivable that the patient’s
gaze had already followed as the doctor pointed lower than the gall-bladder in the
diagram, to trace the transport of bile into the intestines (Segment 34). In this case,
On the multimodality of interpreting in medical briefings
A D P
Table 1. (continued)
A D P
(39) Bom
Well.
(34) Und de/ • • • die Galle wird • (41) Vai.
über den Gallengang • • in den
Darm gegeben.
And th/ • • the bile is • passed It goes
via the bile ducts • • into the in-
testines.
(42) E depois vai para os inten-
stinos.
And then it goes to the in-
testines.
(35) Normalerweise.
Normally.
the use of “aqui” could be traced to the ‘internally dilated speech situation’ (Bührig
& Rehbein 2000), which is responsible for the lack of simultaneity in consecutive
interpreting.
A further difference between source and target expression is that, in the place
of the doctor’s ostensive definitions, the interpreter gives no technical names to the
organs. Rather, in Segment 38, she initially vaguely qualifies the element shown as
the subject, with “o coisu” (the thing), and then adapts the relevant knowledge to
her mother’s prior knowledge: “Tu dizes que é o veneno dos coelhos.” (You say it’s
the rabbit poison.).
After the mother has signalled her understanding (Segment 39), the daugh-
ter again speaks about the transport of the bile into the intestines (Segments 40
and 41), without mentioning the gallstones. The interpreter does not mention the
function of the gall-bladder in the target language, nor does she convey the Por-
tuguese symbol field expressions for all the organs (liver, gall-bladder, bile ducts,
bile). Only the “intestines” (intestinos) are named.
Comparable to the build up of knowledge by the doctor, the interpreter also
tries to use a successive procedure. However, a glance at the knowledge organi-
sation in the target language shows that the daughter’s statements do not cover
several topics, but only one, namely the knowledge verbalised with “o coisu” (the
thing) or “o veneno do coelhos” (rabbit poison), with which the prior knowledge
of the patient is linked. The interpreter continues this topic with the help of the
3rd person nominative morphology of the finite verb “vai” (go).
In the absence of a visual record, we can only guess to what extent the in-
terpreter uses the diagram. However, the intonation patterns suggest that the
On the multimodality of interpreting in medical briefings
interpreter makes accompanying gestures with her statement showing the route
of the bile to the intestines. In Segment 41 she verbalises the destination of the
gallstone visible on the diagram with “para os intestinos” (to the intestines). In this
following utterance, the interpreter uses the verb “vai” (it goes) once again so that
the functional interrelation verbalised by the doctor between the gall-bladder, the
gallstone, and the intestines assumes the character of an iterative procedure in the
target language, which she sub-divides temporally in Segment 41 with “e depois”
(and then).
In Segment 43, based on the verbalised knowledge about the relationship between
the gall-bladder, the gallstone, and the bile ducts, the doctor begins to speak about
a disruption of this functional interrelation, as has presumably happened with the
patient (4).
(4) Und wenn jetzt ein Stein dort in der Gallenblase sitzt, kann es eben passieren,
dass der Stein • • • auch in die Gallenwege abrutscht.
And if there is a stone sitting there in the gall-bladder now, there is even a
chance that the stone • • • could also slide down into the bile ducts.
In the subordinated clause of his statement, the doctor links the hearer’s percep-
tion, through the deictic expression “dort” (there) and the prepositional phrase “in
der Gallenblase” (in the gall-bladder), with the knowledge established previously
about the stones in the gall-bladder in Segments 31 and 32. In the main clause, us-
ing the modal verb “kann” (can), he then expresses the possibility that a gallstone
has found its way from the gall-bladder into the bile ducts.
If we look at this statement more closely, the knowledge verbalised by the doc-
tor shows traces of its generation. This expression has a hypothetical character,
which the doctor verbalises with the expression “wenn” (if ), assigning a purely lin-
guistic reality to the knowledge verbalised in the subordinated clause (cf. Redder
1990: 265). This hypothetical character is also created with the use of the modal
verb “kann” (can) (cf. Ehlich & Rehbein 1972; Redder 1984). However, at the
same time, the doctor qualifies the knowledge expressed in the main clause with
the particle “eben” (even), indicating it to be based on prior knowledge, so that
the possibility of a gallstone moving into the bile ducts assumes the quality of a
‘practical conclusion’ that is derived from specialist medical knowledge (Ehlich &
Rehbein 1979). According to Ehlich (1984), the combination of “wenn” with the
modal verb “kann”, can be understood as an attempt by the doctor to set in motion
a ‘knowledge-based imagination activity’ on the part of the hearer, to guarantee
comprehension of the suspected medical diagnosis.
Kristin Bührig
Table 2 provides the transcript for the subsequent part of the briefing covering the
patient’s suspected diagnosis. (Transcription notation is as for Table 1.)
In Segment 44, the interpreter makes the information expressed in the doctor’s
main clause the central point of the target language statement, using the matrix
construction “pode ser” (maybe). Like the doctor, she thus qualifies the knowl-
edge in question as a possibility, but does not transfer into the target language the
full knowledge verbalised by the doctor in the source language. Whereas the doc-
tor made an effort to use the discourse knowledge which has already been built up
for further reception (cf. Rehbein et al. 1995), the interpreter starts the speech sit-
uation anew. There is no linguistic link to the previous target-language statements,
and thus the statement is not grounded in the discourse knowledge.
In the source language, the suspected diagnosis concerns the possible sliding
of a gallstone. In the target language, however, it is concerned with the possibility
of a stone perhaps being located in a certain place. This localisation is taken over by
the interpreter with the aid of the diagram, on which she focuses the attention of
the patient by using the deictic expression “lá” (there). While the doctor, using the
local distant deixis “dort” (there), creates a perceptual basis for the reception of the
thematic, first part of his statement in the previous relative clause, the interpreter
uses the perception of the patient for the rhematic, final part of the statement. (For
A D P
(43) Und wenn jetzt ein Stein dort (44) Pode ser que é uma pedra tam-
in der Gallenblase sitzt, kann es bém tinha lá.
eben passieren, dass der Stein
• • • auch in die Gallenwege
abrutscht.
And if there is a stone sitting Maybe there was also a stone
there in the gall-bladder now, there.
there is a chance that the stone
• • • could also slide down into the
bile ducts.
(45) Nich?
Right?
(46) • Das macht Schmerzen.
• That causes pain.
(47) ••• Das tut weh.
••• That hurts.
(48) Hmhm
On the multimodality of interpreting in medical briefings
. Conclusion
In her question in Segment 23, “Und wieso kommt lange Schmerz?” (And why is
there so much pain?), the patient expresses a deficit of knowledge which – and this
is clear through the initial “und” (and) – affects the link between the suspected di-
agnosis, expressed in Segment 18 by the doctor, and her pain. The doctor fills this
deficit of knowledge with certain knowledge elements, which clarify the suspected
diagnosis through the functional interrelation between the relevant digestive or-
gans. Through these utterances, the doctor realises the speech action pattern of
‘explaining’ (Ehlich & Rehbein 1986; Rehbein 1985, 1994; Bührig 1996).
In her target-language action, the interpreter does not, however, represent the
functional interrelation between the organs. Instead, with the expression “veneno
dos coelhos” (rabbit poison), she uses one of the patient’s own formulations to de-
scribe the bile. Thereafter, the prior knowledge of the patient updated in this way
becomes the topic of the statements in Portuguese, which are based on the doctor’s
German statements about the transport of the bile to the intestines. With “vai” (it
goes), the interpreter names only one external aspect of this procedure, the move-
ment. Her statements can therefore be best understood as ‘describing’ (Rehbein
1984). The subject of her description – in comparison to the source-language ac-
tions – is reduced to the transport of the bile to the intestines. Thus, she cannot
link the suspected diagnosis to the knowledge already verbalised in the target lan-
guage, but, in contrast to the doctor, makes a new start and then uses the diagram
in the rhematic part of her expression, mentioning a gallstone with the deictic ex-
pression “lá” (there) in the perception space. So, from an overall comparison of
the source-language and target-language utterances it can be seen that the doctor
and the interpreter are indeed using two different ‘linguistic action patterns’.
From the point of view of the visual elements, the analysis has shown that
the medical doctor and the non-trained interpreter also use different multimodal
action patterns. The doctor uses the diagram to systematically build up the pa-
tient’s knowledge, according to his own professional medical knowledge, through
the speech action pattern of ‘explaining’. First, he uses the diagram for ostensive
definitions. Later, he uses the diagram and deictic expressions to refocus thematic
elements of knowledge. By contrast, the interpreter works in a more local way. She
links the diagram to new knowledge and the rhematic parts of her statements.
What causes the interpreter to reproduce the doctor’s statements only par-
tially? One factor may be that the transparent interpreting situation is at fault. Due
Kristin Bührig
Notes
* An earlier version of this chapter was presented at the 32nd meeting of the Gesellschaft für
Angewandte Linguistik (GAL, Passau, 27–29th September, 2001). I would like to thank Jutta
Fienemann, Katharina Meng, Bernd Meyer and Jan ten Thije for their helpful comments on
this presentation. For help with the English version I am indebted to Ann Helin Langridge.
. Others currently working on this project are Bernd Meyer and Erkan Özdil. The financial sup-
port, since 1999, by the Deutsche Forschungsgemeinschaft (DFG) of “Interpreting in Hospitals”
is gratefully acknowledged.
. In the context of functional-pragmatic discourse analysis, following Bühler (1934), who dif-
ferentiates between the ‘Zeigfeld’ (descriptive field) of a language and the ‘Symbolfeld’ (symbol
field), nouns and verb and adverbial stems are analysed as elements of the ‘symbol field’, with
which the speaker directs the listener to complete an ‘appellative procedure’ (on the concept of
the ‘procedure’, see e.g. Ehlich 1991, 1993).
References
Brinkschulte, Melanie & Grießhaber, Wilhelm (1999). “Gestisches Sprechen. Sprechen vor dem
Computer.” In Osnabrücker Beiträge zur Sprachtheorie (OBST), 60, 33–50.
Bühler, Karl (1934/1982). Sprachtheorie. Zur Darstellungsfunktion der Sprache. München: Fink.
Bührig, Kristin (1996). Reformulierende Handlungen. Zur Analyse sprachlicher Adaptierungs-
prozesse in institutioneller Kommunikation. Tübingen: Gunter Narr.
On the multimodality of interpreting in medical briefings
Halliday, M. A. K. (1989). Spoken and Written Language. Oxford: Oxford University Press.
Koch, Peter & Österreicher, Wulf (1985). “Sprache der Nähe – Sprache der Distanz. Münd-
lichkeit und Schriftlichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte.”
Romanistisches Jahrbuch, 36, 15–43.
Löning, Petra & Rehbein, Jochen (1993). Arzt-Patienten-Kommunikation. Analysen zu
interdisziplinären Problemen des medizinischen Diskurses. Berlin and New York: de Gruyter.
Meyer, Bernd (2000). “Medizinische Aufklärungsgespräche. Struktur und Zwecksetzung aus
diskursanalytischer Sicht.” Arbeiten zur Mehrsprachigkeit. Folge B, 8/2000.
Müller, Frank (1989). “Translation in Bilingual Conversation: Pragmatic Aspects of Translatory
Interaction.” Journal of Pragmatics, 13, 713–739.
Pöchhacker, Franz (2000). Dolmetschen. Konzeptuelle Grundlagen und deskriptive Untersuch-
ungen. Tübingen: Stauffenburg.
Raible, Wolfgang (1996). “Orality and literacy. On their medial and conceptual aspects.” In D.
Scheunemann (Ed.), Orality, Literacy and Modern Media (pp. 17–26). Columbia: Camden
House.
Redder, Angelika (1984). Modalverben im Unterrichtsdiskurs. Pragmatik der Modalverben am
Beispiel eines institutionellen Diskurses. Tübingen: Niemeyer.
Redder, Angelika (1990). Grammatiktheorie und sprachliches Handeln: denn und da. Tübingen:
Niemeyer.
Redder, Angelika & Wiese, Ingrid (1994). Medizinische Kommunikation. Diskurspraxis,
Diskursethik, Diskursanalyse. Opladen: Westdeutscher Verlag.
Rehbein, Jochen (1984). “Beschreiben, Berichten und Erzählen.” In K. Ehlich (Ed.), Erzählen in
der Schule (pp. 67–124). Tübingen: Gunter Narr.
Rehbein, Jochen (1985). “Medizinische Beratung türkischer Eltern.” In J. Rehbein (Ed.),
Interkulturelle Kommunikation (pp. 349–419). Tübingen: Gunter Narr.
Rehbein, Jochen (1992). “Zur Wortstellung im komplexen deutschen Satz.” In L. Hoffmann
(Ed.), Deutsche Syntax. Ansichten und Aussichten (pp. 523–574). Berlin and New York: de
Gruyter
Rehbein, Jochen (1994). “Rejective proposals. Semi-professional speech and clients’ varieties in
intercultural doctor-patient-communication.” Multilingua, 13 (1/2), 83–130.
Rehbein, Jochen (1995). “Grammatik kontrastiv – am Beispiel von Problemen mit der Stellung
finiter Elemente.” Jahrbuch Deutsch als Fremdsprache, 21, 265–292.
Rehbein, Jochen (1998). “Die Verwendung von Institutionensprache in Ämtern und Behörden.”
In L. Hoffmann, H. Kalverkämper, & E. H. Wiegand (Eds.), Fachsprachen. Handbücher
zur Sprach- und Kommunikationswissenschaft 14.1 (pp. 689–709). Berlin and New York:
de Gruyter.
Rehbein, Jochen (2000). “Prolegomena zu Untersuchungen von Diskurs, Text, Oralität und
Literalität unter dem Aspekt mehrsprachiger Kommunikation.” In B. Meyer & N. Toufexis
(Eds.), Text / Diskurs, Oralität / Literalität unter dem Aspekt mehrsprachiger Kommunikation.
Beiträge zu dem Workshop ‘Methodologie und Datenanalyse’. Arbeiten zur Mehrsprachigkeit.
Folge B, 11, 2–25.
Rehbein, Jochen (2001). “Das Konzept der Diskursanalyse.” In K. Brinker, G. Antos, W.
Heinemann, & S. F. Sager (Eds.), Text- und Gesprächslinguistik. Linguistics of Text and
Conversation. Ein internationales Handbuch zeitgenössischer Forschung. An International
Handbook of Contemporary Research. Bd.II (pp. 927–945). Berlin and New York: de Gruyter.
On the multimodality of interpreting in medical briefings
Rehbein, Jochen, Kameyama, Shinichi, & Maleck, Ilona (1995). Das reziproke Muster der
Terminabsprache. Zur Modularität von Dialogen und Diskursen. (Verbmobil Memo 23.)
Hamburg: Germanisches Seminar.
Todd, Dundas A. & Fisher, Sue (1993). The Social Organization of Doctor-Patient-Communi-
cation. Norwood: Ablex.
Index
image 9, 10, 12–14, 16–19, 21–28, 31, 32, language-image-link 9, 19, 21, 23, 24,
39, 40, 48, 51, 55, 59, 66, 77, 110, 115, 27, 29
119, 121, 123–131, 141, 161–164, 173, layout 9, 11, 19, 24, 55, 66–68, 73–75,
174, 195–197, 200, 203, 204, 217, 231 77–81, 83, 84, 86, 95
image composition 141 layout hierarchy 78
incohesion 133 layout structure 74, 75, 77–81, 83, 84
indexical signs see semiosis/sign linguistic action patterns 237
information units 124 lip movement 131
information value 66 literal 18, 23, 24, 140, 154, 155, 166,
instantiation 33, 55, 69 178–180, 182
intelligibility 133, 212, 222–224 literary editions 70
inter-modal connections locution 44, 45, 169
see modality/modes logical see metafunction
interface 10, 19, 22, 27 lyrics 144, 146
internally dilated speech situation 234
interpersonal see metafunction
M
interpreting see translation
markup language see corpus linguistics
intersecting hierarchies 70, 71
materiality 11
inter-semiotic see semiosis/inter-semiosis
mathematics 53, 91–96, 98, 99, 103, 106,
intertextuality 173, 185, 215, 216 107, 109, 111–115
grammar see language, symbolism
K and visual images
Kendon 31, 34, 35, 39 history 91, 96–108
kinesic 33, 39, 161–164 inter-semiosis 95, 112, 113
Kress & van Leeuwen 10, 14, 15, 18, 25, intra-semiosis 95
28, 32, 39, 51–58, 65–68, 73, 83–85, language 92, 109, 110
123, 134, 138, 139, 141, 142, 145, 148, mathematics/science relationship 91,
150, 161, 163, 174, 194, 195, 203, 204, 92–94, 114
211, 212, 218 metafunctional organisation 94,
109–112
modern mathematics 92, 107,
L 109–112, 113, 114
language 9–14, 16–19, 22–28, 31–35, multisemiotic nature of 91, 92,
39–41, 44, 45, 47, 51–60, 69, 70, 94–96, 114
73, 74, 83, 86, 91–96, 104, 106, semiotic metaphor 91, 95, 96, 112,
107, 109, 111, 112, 114, 115, 113
119–125, 131, 133, 134, 143, 144, symbolism
147, 148, 151, 154, 155, 157, 158, algebraicisation of geometry
160, 165, 167, 173, 174, 176, 183, 103–106
190, 193–196, 200, 205, 206, 213, grammar 91, 111, 112
214, 222, 224, 227, 229, 231, 232, metafunctions
234–238 experiential meaning 94,
language as choice 33, 52, 94–95, 194, 111
196, 200–204 interpersonal meaning 94,
language continuum 33 95, 122
language role 40, 41, 45, 47 logical meaning 112
language teaching 119–125, 131, 134 textual meaning 112
Index
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: