The Evolution of Music: Theories, Definitions and The Nature of The Evidence
The Evolution of Music: Theories, Definitions and The Nature of The Evidence
The Evolution of Music: Theories, Definitions and The Nature of The Evidence
Chapter 5
5.1 Introduction
It is nowadays uncontroversial among scientists that there is biological continuity between
humans and other species. However, much of what humans do is not shared with other animals.
Human behaviour seems to be as much motivated by inherited biology as by acquired culture,
yet most musical scholarship and research has treated music solely from a cultural perspective.
Over the past 50 years, cognitive research has approached the perception of music as a capacity of
the individual mind, and perhaps as a fundamentally biological phenomenon. This psychology of
music has either ignored, or set aside as too tough to handle, the question of how music becomes
the cultural phenomenon it undoubtedly is. Indeed, only over the past 10 years or so has the
question of the ‘nature’ of culture received serious consideration, or have the operations of mind
necessary for cultural learning explicitly engaged the attention of many cultural researchers
(D’Andrade 1995; Shore 1996). The problem of reconciling ‘cultural’ and ‘biological’ approaches
to music, and indeed to the nature of mind itself, remains.
One way of tackling this problem is to view music from an evolutionary perspective. The idea
that music could have evolutionary origins and selective benefits was widely speculated on in the
early part of the twentieth century, in the light of increasing bodies of ethnographic research and
Darwinian theory (e.g., Wallaschek 1893). This approach fell rapidly out of favour in the years
before the Second World War, for political as much as for scientific reasons, with the repudiation
of biological and universalist ideas in anthropological and musicological fields (Plotkin 1997).
However, evolutionary thinking has again become central in a range of sciences and in recent
philosophical approaches, and music’s relationship to evolutionary processes has been increas-
ingly explored over the past two decades (see also Dissanayake, Chapters 2 and 24; Brandt,
Chapter 3; Merker, Chapter 4, this volume).
Music may not be essential for survival, as eating or breathing are, but, like talking, may confer
a selective benefit and express a motivating principle that has great adaptive power. Music may
have developed from functions evolved for particular life-supporting purposes as a specialization
that elaborates and strengthens those same purposes. As Huron (2001, p. 44) puts it, ‘If music
is an evolutionary adaptation, then it is likely to have a complex genesis. Any musical adaptation
is likely to be built on several other adaptations that might be described as pre-musical or
proto-musical.’
Let us consider the theories that have been proposed to explain how our capacity for music
may have evolved.
In consequence of frequent interaction with the same people, an individual’s behaviours are
likely to acquire the form of approved prosocial norms that emerge within a population.
Adherence to these norms can benefit the members of the group by giving additional rewards
for behaviours that they choose to undertake as individuals (Bowles and Gintis 1998). In
other words, optimal behaviours for the well-being of an individual can be determined
through engagement with conspecifics, as well as between each individual and their non-human
environment. In a social species, the likelihood that individuals will survive to procreate or have a
05-Malloch-Chap05 6/3/08 5:52 PM Page 63
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 63
high rate of procreation depends on their ‘cultural fitness’—how they behave in relation to others
in their social group, not just their physical fitness.
Behaviours that contribute to ‘group cohesiveness’ may make other cooperative behaviours
more likely. Bowles and Gintis (1998) have demonstrated through their game theory model that
‘populations [without a centralizing control] whose interactions are structured in such a way that
coordination problems are successfully overcome will tend to grow, to absorb other populations,
and to be copied by others’ (Shennan 2002, p. 216).
It can be seen that the emergence of musical behaviours as a prosocial norm assisting
coordination within a group could lead to the growth of such groups and the spread of those
behaviours. Not only could musical behaviours become a behavioural norm in their own right,
but, because of their foundations in powerful motives for social awareness and expressive
behaviours, those individuals with well-developed capacities for musical action and perception
should also be best at identifying and engaging with other norms of social interactive behaviour.
Therefore, in theory, musical behaviours fit well with the models of selection, at both individual
and group levels, that demonstrate how the development and spread of musical behaviours
is possible.
in human cultures through its role as ‘ritual’s reward system’; music, for him, is a type of
‘modulatory system acting at the group level to convey the reinforcement value of these
activities … for survival’ (p. 257). For Brown, music’s survival value is thus not immediate and
individual, but lies in its ability to promote group cohesion.
A different position is adopted by Hagen and Bryant (2003), who suggest that rather than
causing social cohesion, music and dance signal social cohesion achieved by other means. Hagen
and Bryant’s overall thesis is that
For humans and human ancestors, musical displays may have … functioned, in part, to defend territory
(and perhaps also to signal group identity), and that these displays may have formed the evolutionary
basis for the musical behaviours of modern humans.
(2003, p. 25)
They propose that music and dance act as indicators of group stability and the ability to carry out
complex coordinated actions (as exemplified, perhaps, in the New Zealand All Blacks’ football
cry, haka). They propose that the time needed to create and practice music and dance corre-
sponds to the quality of the coalition performing them, indicating how much time they have
devoted to preparing their skill.
Hagen and Bryant justify their position, and reject other explanations, on the grounds that
musical behaviours cannot contribute directly to the cohesion of a group, because they are not a
good indicator of an individual’s ability to contribute to the group’s survival. However, this view
of group cohesion purely in terms of immediately perceived costs and benefits of group member-
ship ignores emotional bonding and the loyalty engendered by a mutual emotional experience.
Individuals may already have established their credibility within a group, in terms of their ability
to contribute to its survival, but this provides no indication of their likelihood of doing so, or to
whom they will direct their assistance. The ability of music to act as a forum for the practice of
integrated, complex, coordinated group activities resulting in a powerful sense of membership
and trust provides a coherent explanation as to why these behaviours persisted at a group level.
One of the manifestations of this role may have been ‘coalition signalling’, and this may even have
led to its perpetuation; however, this is unlikely to have been the primary selective force for
music’s development.
At a psychobiological and individual level, rather than a behavioural and social level, musical
experience has been linked with the release and action of life-sustaining regulatory hormones.
Freeman (1995) reports that the neuropeptide transmitter oxytocin aids in the formation of
strong positive emotional memories and in the supplanting of negative emotional memories,
having its strongest effects during trauma or ecstasy. Oxytocin is released into the brain in
females during lactation, and is produced by males and females following sexual orgasm. It medi-
ates in interpersonal bonding, both pair-bonding and mother–infant bonding. Critically,
Freeman suggests that oxytocin is likely to be released while a person is merely listening to music.
This would provide a strong neurological rationale for the role of music in the formation
of social bonds, both in intimate interactions between people and in group musical activities
such as crowd chants (Huron 2001; see Panksepp and Trevarthen, Chapter 7, and Osborne,
Chapter 25, this volume)
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 65
behaviours can indicate sexual fitness, signalling status, age, physical well-being and fertility. He
suggests that dancing reveals aerobic fitness, coordination, strength and health; voice control may
reveal self-confidence and status; rhythmic ability may indicate the ‘capacity for sequencing com-
plex movements reliably’, whist virtuosic performance per se, ‘may reveal motor coordination,
capacity for automating complex learned behaviours, and having the time to practise’ (Miller
2000, p. 340). The last characteristic may also, in young adults, signal sexual availability, as it
implies a lack of parenting demands. These properties of musical and dramatic displays could
lead to aesthetic preferences for particular forms of those behaviours, which leads Miller (2000)
to propose that
Any aspect of music that we find appealing might also have been appealing to our ancestors, and if it
was, that appeal would have set up sexual selection pressures in favour of musical productions that
fulfilled those preferences.
(2000, p. 342)
This logic, however, implies that any musical trait for which there is a preference will subse-
quently be selected for by sexual selection. To this, an important qualifier should be applied: by
definition, selection, sexual or otherwise, for a particular trait can occur only if that trait can arise
by mutation of a gene and can be inherited. Behaviours and skills (for example, a particular lan-
guage or music) can be transmitted in other ways. In addition, if sexual selection was responsible
for the evolution of motives that cause most humans to find features of music aesthetically
appealing, then we would expect convergence in behaviours of musical expression, and in the
aspects of them that give pleasure. While musical behaviours are found in all cultures and share
dynamic features and social motivations and uses, aesthetic preferences are often culture-specific.
Miller argues that ‘If one can perceive the quality, creativity, virtuosity, emotional depth and
spiritual vision of somebody’s music, sexual selection through mate choice can notice it too’ (p. 355);
however, he admits that such rationales are speculative. While his thesis is presented as a call for
empirical testing, Miller’s hypothesis of the fitness-display properties of music does intuitively
make sense. It could provide a mechanism by which musical behaviours may have become
refined, perpetuated and spread in human evolution. His theory attempts to explain how the
forms of musical behaviour may have evolved in the species, rather than how musical forms
became appealing. It may be that the core factor in the appreciation of the quality of the musical
behaviour (and its creativity and virtuosity in artistically developed forms) is its very ‘emotional
depth’, i.e., the extent to which its perception elicits a compelling emotional response, and that
this experience of emotion might not be a product of sexual selection. (See Dissanayake, Chapter 2;
Lee and Schögler, Chapter 6, on emotional expression in movement, this volume).
science of arranging sounds in notes and rhythms to give a desired pattern or effect’ – from the
Penguin Dictionary of Music (Jacobs 1972).
For contemporary musicologists and ethnomusicologists, these definitions are unsatisfactory.
They could apply to, say, a CD recording of a Beethoven string quartet, or a live performance by a
rock band such as Coldplay. It is unclear whether the dictionary definition would embrace either
the musical intentions of a contemporary composer such as Brian Ferneyhough, the sonic
surface of contemporary popular forms such as electronic dance music, or the drum and dance
music of a shamanic ritual in Borneo. To the musicologist and ethnomusicologist, these phe-
nomena are indubitably musical, but ‘sounds combined so as to produce beauty of form and
expression of emotion’ scarcely captures what can be considered to be musical in them. Several
of the scientific conceptions of how musical behaviours and appreciations arose in evolution (for
example that of Miller, 2000 and 2001) appear implicitly to define music according to current
Western musical practices, where music is produced by few and consumed by many.
All of these notions of music reveal themselves to be ideological constructs rooted in the
workings of broader socio-economic and political forces, which are dynamic, changing processes.
As Magrini (2000) notes, changes in the ways in which music is manifested result in the discour-
agement of alternative and often older ways of engaging with music, particularly as an active
element in everyday life. An ‘inhibition’ of musical practices may occur through processes of the
reification of elements in cultural models of engagement with music; this occurs as the role of
the music consumer—as opposed to that of a participant or everyday practitioner in musical
activity—is created, then enhanced and eventually enforced, by institutionalizing or commodifying
the processes of knowledge acquisition. Music-making may thus be inhibited through the loss of
roles, contexts, situations and practices and the impoverished models of music and its social roles
that result may all too easily be taken by music scholars to represent all possible kinds of music.
Before assessing the relationship between music and evolution, it is essential to frame the object
of study in a different way—to perceive music in all of its manifestations.
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 67
In general, it seems that practices that are recognizable as music in societies beyond contempo-
rary global Western culture are characterized by their use of sound and movement together. They
tend to involve collective performance: that is, they are characterized in terms of not only sound
and action, but interaction between the music makers. They are marked by (1) an apparent
‘non-efficaciousness’, in that their immediate and evident consequences are not observable
through material change in the local environment or in the subsequent behaviours of the
participants, and (2) ‘embeddedness’ in a wide range of everyday and special practices. In most,
if not all cases, they also manifest significant hedonic value (Panksepp and Trevarthen, Chapter 7,
this volume).
Accepting that something like music—even if not discretely identified as such by its
practitioners—is in all human cultures, the definitions in our dictionaries seem clearly
unsatisfactory. ‘Music’ as a universal human behaviour is marked by sound, action, interaction,
non-efficacy, and a multiplicity of social functions and emotional effects. These characteristics
will now be assessed in more detail, to arrive at an operational definition of music that might
enable its relationship (if any) to evolutionary processes to be addressed comprehensively.
the pulse, with a concomitant periodic modulation of the amount of attentional resources
devoted to tracking the temporal flow of the music, again orientated around the pulse (Drake
et al. 2000).
According to a cognitive interpretation, pulse abstraction facilitates an optimal use of atten-
tional resources over time. Experiments show that events occurring in temporal alignment with
the inferred pulse are detected and identified more easily than events that occur out of phase with
the pulse (Jones and Yee 1993). What is conceived as the ‘attentional load’ is modulated in time in
accordance with the pulse the subject infers. At a neurophysiological level, the experience of pulse
seems intimately related to the different ranges of timing in the coordination of gross and fine
movements (Thaut 2005). Entrainment to an external pulse may be either volitional (under
conscious control) or preconscious (Stephan et al. 2002).
We conclude that musical interaction between human participants is rooted in intuitive, mind-
generated processes of pulse abstraction/generation within the individuals. These processes
implement the optimal allocation (modulation in time) of attentional resources and may focus
experience in hierarchical temporal structures. The perceptual processes are integral to
the prospective temporal control of periodic motor behaviour. Music as an interactive social
behaviour thus affords the means for synchronizing the deployment of a participant’s experience
of moving with that of other participants, facilitating the individual and the collective
(intersubjective) focus on specific moments and sequential patterns in the temporal unfolding of
the music.
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 69
As music flows in time, it presents rhythmic and melodic patterns that may give rise to expec-
tations for listeners or participants as to how and when it will continue. In the rhythmic flow of
the music, those expectations may be realized or abrogated. Thus, music can generate allusion
to future possibilities of unfolding; when those future possibilities become actualities, the
significance of those earlier musical events may become clear, their sense (at least partially)
disambiguated, giving rise to what Meyer (1956) has called music’s evident meanings.
Those patterns of evident meaning, together with the music’s sonic and gestural qualities as it
unfolds, may also yield a degree of reference, this time beyond the music itself. They may result in
the elicitation of emotion or the evocation of specific conceptual–intentional complexes in the
mind—complexes of ideas with which aspects of the music have become associated through indi-
vidual experience or cultural convention, or because of biosocial predispositions (Cross 2005;
Lavy 2001; Morley 2003, pp. 150–162). But while those conceptual–intentional complexes may
themselves be complex, they are neither propositional nor decomposable in relation to definite
objects of human thought and action. Their experience is likely to vary from participant to
participant, taking form in what Meyer (1956) referred to as connotative complexes. Their sense
and reference is not bound to a specific situation or set of circumstances, but rather to a range of
situations, as a particular emotional or affective mind–brain–body state may be relevant to a range
of circumstances for any one individual (Oatley and Johnson-Laird 1998). Hence, while aspects of
music’s sense may (retrospectively) be disambiguated, its objective reference cannot.
In certain circumstances, however, music can appear to bear meanings in much the same way
as language. Results from functional brain imaging studies support this conclusion. Koelsch et al.
(2004) demonstrated that music can elicit brain responses similar to those elicited by language in
respect of ‘semantic mismatches’, although the responses following a musical context were less
consistent than those following a linguistic context. Music and language both mean; they can
both function in the conceptual–intentional domain as acts of meaning. Nevertheless, language
can express more semantically decomposable propositions; it can refer unambiguously to
complex states of affairs in the world. Music embodies and exploits an essential ambiguity, and in
this respect, language and music may be at complementary poles of a communicative continuum,
meeting somewhere near poetry (Cross 2003c). This inherent ambiguity—together with the
quality of the actions and interactions that were noted earlier as integral to music—suffices to
differentiate music from language, enabling it to be efficacious for individuals and for groups in
contexts where language would be unproductive or impotent, precisely because of the need for
language to be interpreted unambiguously (Brandt, Chapter 3, this volume).
Hence, music might be defined broadly and operationally as embodying, entraining, and
transposably intentionalizing time in sound and action (Cross 2003a), typically expressed
by voices and instruments that articulate patterns in pitch, rhythm and timbre, and involving
correlated gestural patterns of movement that may or may not be oriented towards sound
production. This definition is not intended as an alternative to conventional dictionary
definitions; such definitions effectively delimit those aspects of music that appear significant
within recent Western culture. The broad definition is intended to delineate those attributes that,
in every community, appear to distinguish music from other spheres of human activity in a way
that might enable its relationships to cultural and biological processes to be evaluated. It is not
intended to be either constitutive or essentialist.
in shared, purposeful time. The experience of the coordinated nature of the collective activity is
likely to engender a strong sense of group identity with the communication of pleasure. Music
both entrains movement and experience, and allows each participant to interpret its significances
for him or her self, independently, without the integrity of the collective musical behaviour being
undermined. Music’s ambiguity—its ‘floating intentionality’—in the self and for or with others,
may thus be highly advantageous for groups, serving as a medium for participation and
contributing to the maintenance of social flexibility.
A clue to music’s efficacy for the individual might be found in Meyer’s (1956) suggestion that
music does not merely embody metaphors, but is a ‘metaphorizing medium’ through which
seemingly disparate concepts may be experienced as related and become part of a transforming
experience of the self. Music appears to constitute a medium that facilitates access to, and the
formation of, conceptual–intentional complexes and metaphorical representations that may
apply to many individual and social circumstances. As Meyer puts it:
Music does not [for example] present the concept or image of death itself. Rather it connotes that rich
realm of experience in which death and darkness, night and cold, winter and sleep and silence are all
combined and consolidated into a single connotative complex… What music presents is not any one
of these metaphorical events but rather that which is common to all of them, that which enables them
to become metaphors for one another. Music presents a generic event, a ‘connotative complex’, which
then becomes particularized in the experience of the individual listener.
(1956, p. 265)
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 71
utterances become more fixed and unambiguous in their significance and meaning. In contrast,
protomusical and musical behaviours retain a degree of ambiguity or transposability in their
‘aboutness’, particularly in the babbling stage (Elowson et al. 1998). This ambiguity is evident in
the capacity of prelinguistic utterances to reflect or engage with the temporal dynamics of the
joint actions, physical events, experienced affective states and changes of affective state that can
be shared in social exchanges. The elements of protomusical behaviour can be associated,
for infants and children, with any or all of a wide range of types of event in their experience of
the world.
In what is still the only large-scale study of children’s music and musicality in a non-Western
context, Blacking (1967) notes that music subserves primarily social functions for the children of
the Venda society in southern Africa: ‘Most Venda children are competent musicians … and yet
they have no formal musical training. They learn music by imitating the performances of adults
and other children’ (p. 29). In a society where music is chiefly manifested as interactive behaviour
that plays an especially significant role in structuring social relations in both ritual and everyday
contexts (Blacking 1976), the musicality that emerges from enculturative processes has profound
effects on children’s socialization. Blacking’s findings relate directly to research on how children
learn all manner of knowledge and skills in different cultures, and specifically to the prevalence of
‘intent participation learning’ in the majority of societies (Rogoff et al. 2003), particularly where
there is little or no institutionalized schooling. While the Venda culture that Blacking studied
might be regarded as exceptional in the importance that it accords to music in structuring social
relations, music seems equally socially significant in many other non-Western societies, such as
those of the rural Andes (Stobart 1996), or the partially urbanized and heteroglot cultures of
north-west China (in the form of hua’er songs—Yang 1994). Music and activities exhibiting
musicality in infancy and childhood can be conceived of as providing a medium through which
social flexibility may be acquired and sustained.
Music may also aid development of the individual’s cognitive flexibility. Over the past 20 years,
cognitive psychologists have found that infants do not come into the world as blank slates
(Spelke 1999); neonates are predisposed to pick up and to process experience in quite specific
ways. Capacities for consciousness of events and objects emerge too rapidly to be explained by
the operation of a general-purpose learning mechanism, and their adaptive purpose is now
abundantly evident. Moreover, it has been shown that infants assimilate information pertaining
to the use of physical objects and events quite differently from how they acquire and manage
their intentions toward people and social events. For example, very young children may show a
highly developed capacity to reason about the social world at a level that may not be manifested
in their reasoning about physical objects (Donaldson 1992; Cummins 1998). It could be said that
infants come primed for ‘physics’ and primed for ‘psychology’, each in domain-specific ways.
Yet infants and children ultimately acquire what can be thought of as a domain-general compe-
tence that is useful for grasping meanings in any kind of cultural context. We suggest that music,
or rather protomusical behaviour, is efficacious in the emergence of this domain-general cultural
competence by virtue of its ambiguity—its transposability or floating intentionality. Infants
not only emerge into the world primed for investigation of what a psychological scientist might
identify as physics and psychology, but predisposed to engage in music-like activities in
their interactions with caregivers, which are neither or both of these. Thus, the foci and signifi-
cances of these protomusical activities—inherent musicality—can lie equally in either domain
(Cross 1999): it seems probable that they operate at a more fundamental motivating level,
enhancing the likelihood of integration of information across physical and social experience, and
facilitating the formation of a general competence not tied to any cognitively specialized domain
(Cross 2005).
05-Malloch-Chap05 6/3/08 5:52 PM Page 72
There is tentative evidence for this suggestion in the positive correlations between IQ and the
engagement in musical activities found in studies reviewed by Schellenberg (2003). His own
more rigorously conducted study (Schellenberg 2004) shows that engaging in music lessons leads
to a small but statistically significant enhancement of IQ. While this evidence suggests that music
has limited effect on the intellectual capacities of some individuals, it is also possible that, for
Schellenberg’s participants, the formal Western music lesson (which tends to take a form very
similar to a school lesson) provides a highly culture-specific learning context that minimizes
the extent to which the apparent social efficacy of music can be explored and exercised (for an
exception to this learning context, see Fröhlich, Chapter 22, this volume).
We conclude that music and language, while different parts of the human communicative
toolkit, both provide purposeful syntactic frameworks that serve human needs of joint action
and interaction. Similar capacities underlie their use, including the capacity to produce complex
and hierarchically structured sequences of events (sounds and actions) and to abstract structure
from such patterns produced by others. However, where language and music diverge is in the
ways in which the structures of those patterns are endowed with significance. In language,
considerations of reference and of relevance with regard to states of affairs in the world (Sperber
and Wilson 1986) are paramount. In music, unambiguous reference and relevance are much less
significant; the primary determinant of musical experience might well be how the perceived
sounds fit with the temporal structures experienced in a moving human body.
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 73
It appears that there is no strong evidence that musicality is not a universal human attribute.
However, very little scientific research into the possession of musical capacities has been
conducted outside the confines of contemporary Western society, and for a wider picture one
must rely on the ethnographic record. From the evidence presented in the ethnographic and
scientific literature taken together, we conclude that, as with language, all humans (with a very
few rare exceptions) have the capacity to engage in musical behaviours.
In view of the extent to which music appears entwined with other domains of human
behaviour, it seems feasible to suggest that this human capacity for music may comprise a
number of components, which may have come about under the influence of a range of different
evolutionary pressures. The integrated suite of behavioural capacities that constitutes modern
human musicality might have a variety of sources in prehistoric adaptive changes.
Pinker’s (1997) description of music as a technology with no evolutionarily adaptive value, a
view apparently predicated on the notion that music consists simply of sonic patterns, is unac-
ceptable to us. As we have seen, music cannot be reduced to patterns of sound, and its effects
appear more far-reaching than simple and immediate hedonic response in individuals. Miller’s
(2000) sexual selection theory, which focuses on music as display, may well describe some of
the ways in which musicality was adaptive in human evolution. However, as evident from the
foregoing, music is more than display: it typically involves coordinated interaction in individual
performance. It seems highly likely that music plays a significant role in forming and maintaining
group cohesion among humans, as Brown (2000b) suggests, by virtue of its capacity to entrain
activity, and its floating intentionality. Despite differences, there appear to be close functional
correspondences between music and language, which support Brown’s (2000a) suggestion that
they share a common and deeply rooted evolutionary origin.
Vocal play, in the form of babbling, does not appear to be unique to humans; Elowson et al.
(1998) note that this behaviour occurs in juvenile pygmy marmosets, and that response from
a caregiving adult is more likely when the juvenile is vocalizing, and suggest that pygmy
marmoset babbling has relevance to understanding the evolutionary processes of human vocal
development. It may be that an association between vocal play and a positive caregiving response
privilege the social function of these types of play.
We suggest that in an increasingly altricial lineage, the need to accommodate to population
structures with an increasing proportion of members with access to juvenile modes of cognition,
motivation and behaviour (other factors being equal) may have favoured the emergence of some-
thing like musicality as a means of assimilating the value of those juvenile modes of exploratory
cognition into the adult behavioural repertoire, while regulating its modes of expression. Given
that play is a particular feature of the behaviour of juveniles in social mammals, and that it is
likely to have positive survival value for members of those species who engage in it, it is probable
that group behaviours that both enable and regulate it to co-opt its utility into the adult reper-
toire are likely to have some adaptive or exaptive value. Music can be interpreted as one of these
mechanisms, emerging under the selection pressures of the progressive extension and stage-
differentiation of the juvenile period in the later hominid lineage.
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 75
capacities arose at different times in the hominid lineage that leads to modern humans (Morley
2002; 2003).
The evidence suggests that our nearest primate relatives have few capacities that could be
interpreted as musical. Chimpanzees and bonobos lack the phonational capacity for the produc-
tion of complex vocal signals, partly because of their very different physiques (Morley 2002), and
there is no evidence that either species can entrain to regular patterns of visual or sonic stimuli
(however, see Fitch 2006). A recent survey of systems of animal communication (Seyfarth and
Cheney 2003) concludes that even among primates, the interpretability of vocal signals by
conspecifics is generally bound so tightly to the awareness of present circumstances that they
cannot be regarded as referential. Calls that might be conceived of as conveying disembedded
information to conspecifics are better thought of as expressing an individual’s affective state,
without reference or intention to inform others. As the authors note, ‘In sum, a variety of results
argue that, in marked contrast to humans, nonhuman primates do not produce vocalizations in
response to their perception of another individual’s ignorance or need for information’ (p. 159).
It appears that although some non-human primates, notably gibbons, can produce complex and
long sequences of sound and action, a key element of musicality—the engagement with the
intentionalities of such sequences—is absent (Merker, Chapter 4, this volume).
The likelihood of significant continuities between the lifeways of other primates and of
australopithecines (currently the oldest known ancestor genus leading to modern humans)
suggests that no significant components of a human faculty for music emerged with this latter
group of species, although it might be hypothesized that the move to bipedalism laid some of the
foundations for a capacity for entrainment in rhythmic stepping and gesturing. Recent evolu-
tionary thinking (see Wood and Collard 1999) interprets the very early humans Homo habilis
(and possibly Homo rudolfensis) (from 2 million years before the present) as manifesting a high
degree of continuity with australopithecine lifeways and capacities; however, the archaeology
associated with the species shows significant changes in the evidence for toolmaking and the
transmission of traditions of tool manufacture. While H. habilis and rudolfensis remains are frag-
mentary and their interpretation is debated, the manufacture and use of tools suggests that the
species had more muscularly developed hands, perhaps with a longer thumb, than did their pred-
ecessor species, and a greater degree of refinement in the control of manual movement (Wilson
1998). These capacities are likely to have allowed for the beginnings of finely controlled expres-
sive manual gesture, an intrinsic component of all modern human communicative systems.
With Homo ergaster and Homo erectus (from about 1.8 million years before the present), major
changes occurred; brain size reached around 1000 cc, and body size and configuration approxi-
mated those of modern humans. H. ergaster and H. erectus had more complex lifeways and toolkits
than their precursors, and a vast increase in geographical range. The capacity for the much-
enhanced control of phonation—conferred by a barrel-shaped chest, the enhanced articulatory
capacities of the vocal system, and the presence of an ear canal of modern proportions—suggests
that vocal sounds were increasingly significant for this species. This may indicate significant
changes in social life, perhaps marking the emergence of a rich vocal repertoire to replace other
forms of interpersonal interaction (in conformance with Dunbar’s [1992] ‘grooming-to-gossip’
model). The evidence also suggests that some foundational components of musicality were in
place, most likely expressed in the use of vocal sounds to articulate complex emotion states in the
regulation of social relations, and possibly to convey referential information.
It was not until the appearance of Homo heidelbergensis (c.700 to 500 kyr BP), however, that we
find the fully modern vocal tract, together with an auditory system that is maximally sensitive
to speech frequencies (Martinez et al. 2004). This coadaptation suggests that vocal sounds were
crucially significant for this species, more so than other environmental sounds. This can be
05-Malloch-Chap05 6/3/08 5:52 PM Page 76
construed as a refinement of earlier H. ergaster capacities, which is supported by evidence for the
production and use of an expanded range of artefacts. This advance in creativity is likely to have
been manifested in the capacity to produce and perceive increasingly complex vocal sounds and
sequences, including behaviours that we might identify as singing.
Following the emergence of anatomically modern Homo sapiens, which dates back some
150 kyr BP, we ultimately find evidence for symbolic intelligence or ‘fully modern sapiens
behaviour’ (Henshilwood and Marean 2003), and unambiguous evidence of musical behaviours.
These behaviours are built on cognitive, physiological and behavioural foundations that emerged
in the preceding hominid species, as outlined above. At what point these behaviours can be
considered symbolic, in the sense of having the capacity to indicate meaning through an arbi-
trary coupling of sign and referent, is open to debate, but the capabilities probably emergent
in H. ergaster, and then developed in H. heidelbergensis, would have featured strong associations
between emotional content and vocal and physical gesture. Symbolic culture, in which signs enter
into a web of interrelationships that come to constitute a significant feature of the ecology of the
human mind (Chase 1999), emerged with modern Homo sapiens.
Thus, we suggest that the emergence and development of complex manual and vocal gesture,
under the conditions of greater social complexity associated with H. ergaster and H. erectus,
constituted the foundations of what would come to be melodic vocalization, i.e., singing. It seems
likely that the production and perception of complex sequences of sounds with the voice was
very important by the time of H. heidelbergensis, and that the social roles of such vocalizations,
including the potential to rehearse and refine social interactions, were built on subsequently, to
become a part of music and language in the fully symbolic culture that emerged in modern
humans.
5.6 Conclusions
The evolutionary story can be read as indicating that a version of Brown’s (2000a) musilanguage
may have emerged with H. ergaster, perhaps restricted to the exchange of social information, with
a further development of a capacity for more general reference with H. heidelbergensis. It seems
likely that the divergence between music and language arose first in modern humans, with
language emerging to fulfil communicative, ostensive and propositional functions with immediate
efficacy. Music, operating over longer timescales, emerged to sustain (and perhaps also to foster)
the capacity to manage social interactions, while providing a matrix for the integration of infor-
mation across domains of human experience. We propose that music and language enabled the
emergence of modern human social and individual cognitive flexibility (Cross 1999). We regard
both music and language as subcomponents of the human communicative toolkit—as two
complementary mechanisms for the achievement of productivity in human interaction though
working over different timescales and in different ways.
While the selection pressures for the emergence of language are widely regarded as self-evident
(Pinker 1994), those for music appear less well understood, perhaps because the effects of music
appear less immediate and direct, or obvious, than do those of language (Mithen 2005).
However, we suggest that a degree of adaptation to changes in the rate of individual maturation
evident in the later hominid lineage may be a factor that led to the human capacity for musicality,
distinct from, and perhaps foundational, in respect of language (Cross 2003b).
Musical capacities are built on fundamentally important social and physiological mechanisms
and, at an essential level, are processed as such. Music uses capacities crucial in situations of
social complexity; the vocal, facial and interactive foundations of these capabilities are evident in
other higher primates, and such capacities would have become increasingly important and
05-Malloch-Chap05 6/3/08 5:52 PM Page 77
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 77
sophisticated as group size and complexity increased. Vocal emotional expression, interaction,
and sensitivity to others’ emotional state would have been selectively important abilities; individ-
uals in which these capabilities were more developed would have been selectively favoured.
Fundamentally integrated into the planning and control of complex sequences of vocalizations,
and related to the prosodic rhythm inherent in such sequences, is rhythmic motor coordination.
The motor system is primed in the instigation of such vocal behaviours, and corporeal gesture is
consequently incorporated into the execution of the vocal behaviour.
In terms of their potential selective advantages, developed musical behaviours could confer an
advantage on individuals in terms of sexual selection; this was due to their foundations in the
capacities to communicate emotionally and effectively, to empathize, to bond and elicit loyalty.
Musical abilities have the potential to be a proxy for an individual’s likelihood of having strong
social networks and loyalties, and of contributing to a group. Musical behaviour also has the
potential to be a mechanism for stimulating and maintaining those networks and loyalties;
because of the stimulation of shared emotional experience as a consequence of participation
in musical activities, it can engender strong feelings of empathic association and group
membership. Musical or protomusical behaviour has the potential to make use of several cogni-
tive capacities at once, relying on the integration and control of biological, psychological, social
and physical systems; it gives the opportunity to practise and develop these integrated skills in a
context of limited risk.
The emergence of full (specialized, as opposed to proto-) musical behaviours, with founda-
tions in social interaction, emotional expression, and fine control and planning of corporeal and
vocal muscular control, lends them extremely well to integrating important cognitive skills. The
execution of musical activities could become increasingly important and beneficial on both indi-
vidual and group levels, with increasing social complexity within and between groups. Because
music production and perception is processed by the brain in ways that are complex and related
to interpersonal interaction and the formation of social bonds, it stimulates many associated
functions. It seems that musical participation, even without lyrics or symbolic associations, can
act on the brain in ways that are appealing to humans, because of their vicarious stimulation of
fundamentally important human interactive capacities.
While this model for the emergence of musicality appears to fit well with the evidence available
from ethnographic, cognitive, comparative, palaeo–anatomical and archaeological sources, other
ecologically observable behaviours suggest further facets to the evolutionary story require
consideration. The investigation of the origins, emergence and nature of musical behaviours in
humans is in its early stages, and has more to reveal. It concerns an element of human behaviour
that, in contrast with Pinker’s (1997) opinion, the vast majority of people would miss very much
if they were suddenly bereft of it. It would be impossible to do away with music without removing
many of the abilities of social cognition that are fundamental to being human.
References
Arom S (1991). African polyphony and polyrhythm. Cambridge University Press, Cambridge.
Auer P, Couper-Kuhlen E and Muller K (1999). Language in time: The rhythm and tempo of spoken
language. Oxford University Press, Oxford.
Baily J (1985). Music structure and human movement. In P Howell, I Cross & R West, eds, Musical structure
and cognition, pp. 237–258. Academic Press, London.
Bekoff M (1998). Playing with play: What can we learn about cognition, negotiation and evolution?
In DD Cummins & C Allen, eds, The evolution of mind, pp. 162–182. Oxford University Press, Oxford.
Benzon W (2001). Beethoven’s anvil: Music, mind and culture. Basic Books, New York.
05-Malloch-Chap05 6/3/08 5:52 PM Page 78
Blacking J (1961). Patterns of Nsenga kalimba music. African Music, 2(4), 3–20.
Blacking J (1967). Venda children’s songs: A study in ethnomusicological analysis. Witwatersrand University
Press, Johannesburg.
Blacking J (1969). The value of music in human experience. Yearbook of the International Folk Music
Council, 1, 33–71.
Blacking J (1976). How musical is man? Faber, London.
Blacking J (1995). Music, culture and experience. University of Chicago Press, London.
Bogin B (1999). Patterns of human growth, 2nd edn. Cambridge University Press, Cambridge.
Bowles S and Gintis H (1998). The moral economy of community: structured populations and the
evolution of pro-social norms. Evolution and Human Behaviour, 19, 3–25.
Brown S (2000a). The ‘musilanguage’ model of music evolution. In N Wallin, B Merker and S Brown, eds,
The origins of music, pp. 271–300. MIT Press, Cambridge, MA.
Brown S (2000b). Evolutionary models of music: From sexual selection to group selection.
In F Tonneau & NS Thompson, eds, Perspectives in ethology 13: Behavior, evolution and culture,
pp. 231–281. Plenum Publishers, New York.
Chase P (1999). Symbolism as reference and symbolism as culture. In C Knight, R Dunbar and C Power,
eds, The evolution of culture: An interdisciplinary view, pp. 34–49. Edinburgh University Press,
Edinburgh.
Clayton M, Sager R and Will U (2004). In time with the music: The concept of entrainment and its
significance for ethnomusicology. ESEM Counterpoint, 1, 1–82.
Cross I (1999). Is music the most important thing we ever did? Music, development and evolution.
In S W Yi, ed., Music, mind and science, pp. 10–39. Seoul National University Press, Seoul.
Cross I (2003a). Music and biocultural evolution. In M Clayton, T Herbert and R Middleton, eds,
The cultural study of music: A critical introduction, pp. 19–30. Routledge, London.
Cross I (2003b). Music and evolution: causes and consequences. Contemporary Music Review, 22(3), 79–89.
Cross I (2003c). Music, cognition, culture and evolution. In I Peretz and R Zatorre, eds, The cognitive
neuroscience of music, pp. 42–56. Oxford University Press, Oxford.
Cross I (2005). Music and meaning, ambiguity and evolution. In D Miell, R MacDonald and D Hargreaves,
eds, Musical Communication, pp. 27–43. Oxford University Press, Oxford.
Cross I, Zubrow E and Cowan F (2002). Musical behaviours and the archaeological record: a preliminary
study. In J Mathieu, ed., Experimental archaeology: Replicating past objects, behaviors and processes,
pp. 25–34. British Archaeological Reports International Series 1035. Archaeopress, Oxford.
Cross I and Watson A (2006). Acoustics and the human experience of socially organised sound. In C Scarre
and G Lawson, eds, Acoustics, space and intentionality: Identifying intentionality in the ancient use of
acoustic spaces and structures, pp. 107–116. McDonald Institute for Archaeological Research, Cambridge.
Cummins DD (1998). Social norms and other minds: the evolutionary roots of higher cognition.
In DD Cummins and C Allen, eds, The evolution of mind, pp. 30–50. Oxford University Press, Oxford.
D’Andrade R (1995). The development of cognitive anthropology. Cambridge University Press, Cambridge.
D’Errico F, Henshilwood C, Lawson G, et al. (2003). Archaeological evidence for the emergence of
language, symbolism, and music – an alternative multidisciplinary perspective. Journal of World
Prehistory, 17(1), 1–70.
Darwin C (1871). The descent of man and selection in relation to sex. Murray, London.
Dissanayake E (2000). Antecedents of the temporal arts in early mother–infant interactions. In N Wallin,
B Merker and S Brown, eds, The origins of music, pp. 389–407. MIT Press, Cambridge, MA.
Donaldson M (1992). Human minds: An exploration. Allen Lane/Penguin Books, London
Drake C Jones MR and Baruch C (2000). The development of rhythmic attending in auditory sequences:
attunement, referent period, focal attending. Cognition, 77, 251–288.
Dunbar R (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution,
22, 469–493.
05-Malloch-Chap05 6/3/08 5:52 PM Page 79
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 79
Elowson AM, Snowdon CT and Lazaro-Perea C (1998). ‘Babbling’ and social context in infant monkeys:
parallels to human infants. Trends in Cognitive Sciences, 2, 31–37.
Fitch W Tecumseh (2006). The biology and evolution of music: a comparative perspective. Cognition,
100(1), 173–215.
Foley RA (1995). Humans before humanity. Blackwell, Oxford.
Freeman WJ (1995). Societies of brains. A study in the neurobiology of love and hate. Erlbaum, Mahwah, NJ.
Frege G (1952). On sense and reference. In P Geach and M Black, eds, Translations from the Philosophical
Writings of Gottlob Frege. Blackwell, Oxford.
Hagen EH and Bryant GA (2003). Music and dance as a coalition signaling system. Human Nature,
14(1), 21–51.
Henshilwood CS and Marean CW (2003). The origin of modern human behavior: critique of the models
and their test implications. Current Anthropology, 44(5), 627–651.
Huron D (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Science, 930, 43–61.
Jacobs A (1972). New dictionary of music, 2nd edn. Penguin Books, Harmonsdworth.
Janata P and Grafton ST (2003). Swinging in the brain: Shared neural substrates for behaviors related to
sequencing and music. Nature Neuroscience, 6(7), 682–687.
Joffe TH (1997). Social pressures have selected for an extended juvenile period in primates. Journal of
Human Evolution, 32(6), 593–605.
Jones MR and Yee W (1993). Attending to auditory events: The role of temporal organization.
In S McAdams and E Bigand, eds, Thinking in sound, pp. 69–112. Oxford University Press, Oxford.
Kalmus A and Fry DB (1980). On tune deafness (dysmelodia): Frequency, development, genetics and
musical background. Annals of Human Genetics, 43(4), 369–382.
Koelsch S, Kasper E, Sammler D, Schultze K, Gunter T and Frederici A (2004). Music, language and
meaning: brain signatures of semantic processing. Nature Neuroscience, 7(3), 302–307.
Langer S (1942). Philosophy in a new key. Harvard University Press, Cambridge, MA.
Lavy M (2001). Emotion and the experience of listening to music: A framework for empirical research,
Ph.D. thesis.University of Cambridge. Available at http://www.scribblin.gs
Magrini T (2000). From music-makers to virtual singers: New musics and puzzled scholars. In D Greer, ed.
Musicology & sister disciplines, pp. 320–330. Oxford University Press, Oxford.
Martinez I, Rosa M, Arsuaga J-L et al. (2004). Auditory capacities in Middle Pleistocene humans from the
Sierra de Atapuerca in Spain. Proceedings of the National Academy of Sciences, 101(27), 9976–9981.
Meyer LB (1956). Emotion and meaning in music. University of Chicago Press, London.
Miller G (2000). Evolution of human music through sexual selection. In N Wallin, B Merker and S Brown,
eds, The origins of music, pp. 329–360. MIT Press, Cambridge, MA.
Miller G (2001). The mating mind: How sexual choice shaped the evolution of human nature. Vintage/Ebury,
London.
Mithen S (2005). The singing Neanderthals: The origins of music, language, mind and body.
Weidenfeld & Nicolson, London.
Morley I (2002). Evolution of the physiological and neurological capacities for music. Cambridge
Archaeological Journal, 12(2), 195–216.
Morley I (2003). The evolutionary origins and archaeology of music: An investigation into the prehistory of
human musical capacities and behaviours. Ph.D. thesis. University of Cambridge, Cambridge. Darwin
College Research Reports, DCRR-002, available online at www.dar.cam.ac.uk/dcrr/
Nelson S (2002). Melodic improvisation on a twelve-bar blues model: an investigation of physical and historical
aspects, and their contribution to performance. Ph.D. thesis. City University London, Department of
Music, London.
Oatley K and Johnson-Laird PN (1998). The communicative theory of the emotions: Empirical tests,
mental models and implications for social interactions. In JM Jenkins, K Oatley and NL Stein, eds,
Human emotions: A reader, pp. 84–97. Blackwell, Oxford.
05-Malloch-Chap05 6/3/08 5:52 PM Page 80
Panksepp J and Burgdorf J (2003) ‘’Laughing’’ rats and the evolutionary antecedents of human joy?
Physiology and Behavior, 79, 533–547.
Papousek H (1996). Musicality in infancy research: Biological and cultural origins of early musicality.
In I Deliège and JA Sloboda, eds, Musical beginnings, pp. 37–55. Oxford University Press, Oxford.
Papousek M (1996). Intuitive parenting: A hidden source of musical stimulation in infancy. In I Deliège
and JA Sloboda, eds, Musical beginnings, pp. 88–112. Oxford University Press, Oxford.
Peretz I (2003). Brain specialization for music: New evidence from congenital amusia. In I Peretz and
R Zatorre, eds, The cognitive neuroscience of music, pp. 192–203. Oxford University Press, Oxford.
Peretz I, Champod AS and Hyde K (2003). Varieties of musical disorders: The Montréal Battery of
Evaluation of Amusia. Annals of the New York Academy of Sciences: The Neurosciences and Music,
999, 58–75.
Pinker S (1994). The language instinct. Allen Lane, London.
Pinker S (1997). How the mind works. Allen Lane, London.
Plotkin H (1997). Evolution in mind. Allen Lane, London.
Roederer JG (1984). The search for a survival value of music. Music Perception, 1, 350–356.
Rogoff B, Paradise R, Arauz RM, Correa-Chévez M and Angelillo C (2003) First-hand learning through
intent participation. Annual Review of Psychology, 54, 175–203.
Schellenberg EG (2003). Does exposure to music have beneficial side effects? In I Peretz and R Zatorre, eds,
The cognitive neuroscience of music, pp. 430–448. Oxford University Press, Oxford.
Schellenberg EG (2004). Music lessons enhance IQ. Psychological Science, 15(8), 511–514.
Scherer C and Zentner MR (2001). Emotional effects of music: Production rules. In P Juslin and JA Sloboda,
eds, Music and emotion: theory and research, pp. 361–392. Oxford University Press, Oxford.
Scherer KR (1991) Emotion expression in speech and music. In J Sundberg, L Nord and R Carlson, eds,
Music, Language, Speech and Brain, 146–156. MacMillan Press, Basingstoke.
Scothern PMT (1992). The music-archaeology of the palaeolithic within its cultural setting. Ph.D. thesis.
University of Cambridge, Cambridge.
Seyfarth RM and Cheney DL (2003). Signalers and receivers in animal communication. Annual Review of
Psychology, 54, 145–173.
Shennan S (2002) Genes, memes and human history. Thames and Hudson, London.
Shore B (1996). Culture in mind: Cognition, culture, and the problem of meaning. Oxford University Press,
Oxford.
Sloboda JA (1985). The musical mind. Oxford University Press, Oxford.
Spelke E (1999). Infant cognition. In RA Wilson and FC Keil, eds, The MIT encyclopedia of cognitive sciences,
pp. 402–404. MIT Press, Cambridge, MA.
Sperber D and Wilson D (1986). Relevance: Communication and cognition. Blackwell, Oxford.
Stephan KM, Thaut MH, Wunderlich G et al. (2002). Conscious and subconscious sensorimotor
synchronization – prefrontal cortex and the influence of awareness. NeuroImage, 15, 345–352.
Stobart HF (1996). Tara and Q’iwa: Worlds of sound and meaning. In MP Baumann, ed., Cosmología y
música en los Andes (Music and cosmology in the Andes), pp. 67–81. Biblioteca Iberoamericana and
Vervuert Verlag, Madrid and Frankfurt.
Stobart HF and Cross I (2000). The Andean anacrusis? Rhythmic structure and perception in Easter songs
of Northern Potosí, Bolivia. British Journal of Ethnomusicology, 9(2), 63–94.
Sykes JB (1983). Concise Oxford dictionary, 7th edn. Oxford University Press, Oxford.
Thaut MH (2005). Rhythm, human temporality, and brain function. In D Miell, R MacDonald and
D Hargreaves, eds, Musical Communication, pp. 171–191. Oxford University Press, Oxford.
Trevarthen C (1979). Communication and cooperation in early infancy. A description of primary
intersubjectivity. In M Bullowa, ed., Before speech: The beginning of human communication,
pp. 321–347. Cambridge University Press, London.
05-Malloch-Chap05 6/3/08 5:52 PM Page 81
THE EVOLUTION OF MUSIC: THEORIES, DEFINITIONS AND THE NATURE OF THE EVIDENCE 81