Fighting Repetition in Game Audio
Fighting Repetition in Game Audio
Fighting Repetition in Game Audio
JEAN-FREDERIC VACHON1
1
Audio Technical Director, Artificial Mind & Movement (A2M), Montreal, Canada
jean-frederic.vachon@a2m.com
Historically, one of the biggest problems facing game audio has been the endless repetition of sounds. From sound bites
that play constantly, to repeated sound effects to a limited music selection that loops endlessly, players have had every
reason to be annoyed at game audio. Despite increased memory budgets on modern consoles, this problem is still
relevant. This paper will examine the pros and cons of various approaches used in game audio, as well as the various
technologies and researches that might eventually be applied to the field.
1 INTRODUCTION
Since its beginnings, game audio has been viewed by As games evolved (and its clientele got older), so did
many as repetitive and limited in content. [1] Gamers the audio presentation. CD-ROMs allowed developers
have been much more forgiving of repeated visuals than to put CD quality audio in their games, one of the first
repeated audio, and audio designers have had to rely on and most notable being The 7th Guest (1992) which
a bag of tricks to stretch limited content over a game featured CD quality music composer by George
experience that can last from a few hours to over 100 “Fatman” Sanger. Graphics also evolved and video
hours in some cases. With audio technology improving games slowly edged themselves towards realism. As
at a rapid pace, we are coming close to an era where graphics and AI struggled to bring themselves over what
advanced audio manipulation will be available on a is called the uncanny valley1, audio managed to
console game. successfully present what could be considered totally
realistic audio, or at least transparent enough to be
It should be noted that of course not all games suffer perceived as such. This raised expectation from players
from the problems laid out in this document. There are to have audio that is organic and “randomly patterned”,
fortunately many games that excel in audio, and manage which is a simple way of saying that nature, while being
to avoid most, if not all, of these pitfalls. But not every chaotic, follows patterns and has a natural order.
game is a AAA title, and with lower budgets, come
smaller teams, and often more shortcuts. Most of the On every new generation of game consoles (and with
solutions put forward in this document are low cost the constant increase in PC power), more memory
ones, requiring more of a philosophical change than real became available for audio. The pitfall that comes with
investment. Some though are in the realm of R&D at this increase is the erroneous belief that the constraints
this point, but maybe they’ll inspire some of the bigger of the previous generation are gone: since every aspect
titles to push forward and come up with solutions that of the game also increases its memory budget, the
can revolutionize game audio for the next generation. number of objects, the size of the environments and the
expectations all go up accordingly. As games increase
2 HISTORICAL PERSPECTIVE in scope, audio designer still face the problem of
repetition in sound.
The first generations of video games had primitive
synthesized audio that provided a very limited palette An interesting development that started to appear more
for sounds, and the severe memory limitations did not and more as audio increased in quality was the inclusion
allow for any variation. So the limited amount of sound
of mixing sliders for music and sound effects (and
effects and music that were created were repeated across
sometimes dialog) that allowed audio to be adjusted
the entire game. That situation was the norm, and
according to the player’s preference, or even turned off
players accepted it readily. The often abstract (or at least entirely. The gaming industry’s conundrum became that
highly stylized) nature of the games of that era made it a those features were seen as solution to the inevitable
lot easier to suspend expectations of realistic audio. It
repetition, while in some case not much effort was put
could also be argued that at the time, video games were
mostly a kid pastime, and that kids have a high
tolerance for repetition.
1
http://en.wikipedia.org/wiki/Uncanny_valley
in avoiding it because players could turn it off if it have at time tried to create varying sounds in these areas
annoyed them. as an experiment, and have been faced with immediate
negative reaction from the development team.
Audio is the only field in game development that allows
itself to be turned off. Some see that as a testament to One type of sound effect that has attracted a lot of
the power of audio: that audio is such an important part attention for being too repetitive is footstep sounds.
of the experience that if a player doesn’t like it, it is Since most games involve a main character walking and
better to turn it off than to ruin the game. Others, like running around an expansive world, footstep sounds are
myself, consider these options an unfortunate industry probably the most heard sound effects in a typical video
habit that belittles the powerful impact audio can have game.
on a game experience.
Fighting sounds (fist impacts, whooshes, etc) are also
3 WHAT CONSTITUTES AUDIO REPETITION often very repetitive because many titles that feature
IN A VIDEO GAME fighting rely on this mechanic for a significant portion
One of the challenges of game audio is the non-linear of the game experience. In his book Audio for Games:
aspect of the medium. Contrary to film or television, Planning, Process and Production, Alexander Brandon
where the number of occurrences of any event is a has this to say about fighting games:
known quantity, games are interactive, and as such, the
narrative flow is different on every play through. If we “Fighting games in particular will benefit from
are to try to avoid unwanted repetition in a game’s nonrepetition. Injury sounds should number in the
audio, we must first identify where these usually occur. dozens if not hundreds to ensure a variety of effects.
All of the three major audio categories (sound effects, Soul Calibur 2 achieves this through the use of many
music and dialog) suffer from it, but in various ways attack, hit and reaction sounds at random.” [3]
that must be addressed separately.
Basically any type of sound effect that is linked to a
Bungie Audio Director Martin O’Donnell talked about game event that can appear frequently in a short period
the importance of avoiding repetition in Halo: of time is subject to being perceived as repetitive. Using
more alternate sounds is one way to counter this effect,
The most important feature of a soundtag is that it but it is not the only alternative available.
contains enough permutations and the proper
randomization so that players do not feel like they're 3.2 Music
hearing the same thing repeated over and over. Even Music is another area where repetition is often
the greatest and most satisfying sound, dialog or music noticeable. While cultural expectations have been set in
will be diminished with too much repetition. It is also films to have unique cues (although with repetition of
important to have the ability to randomize the interval thematic content) for the entire film, games offer an
of any repetition. It might be difficult to get the sound of experience that lasts much longer and it is economically
one crow caw to be vastly different from another, but unrealistic to expect unique music throughout. The
the biggest tip off to the listener that something is problem appears mostly in games where the music is
artificial is when the crow always caws just after the used for dramatic purposes, but even games using
leaf rustle and before the frog croak every thirty licensed music as musical wallpaper can be faced with
seconds or so. The exception to that rule are specific this problem. Despite often having a high number of
game play sounds that need to give the player tracks available, the user will stick to a limited subset
immediate and unequivocal information, such as a low that corresponds to his or her preference. For example,
health alarm. most players of Grand Theft Auto will limit themselves
(O’Donnell 2002) [2] to a few radio stations, thereby cutting themselves off
from most of the provided music.
3.1 Sound Effects
Sound effects that are non diegetic, like those for HUD 2, To make matters worse, most forms of music rely on
menus or collectible items, are pretty much universally repetition to establish thematic material, which further
accepted as repeating. Since they hold no real world cuts down on the amount of unique material available.
equivalent (e.g. collectibles) or are of a mechanical
nature (e.g. HUD), players do not have expectations of
variation, and in fact expect this constant behaviour. I 3.3 Dialog
Dialog is one of the worst offenders of audio repetition.
2 It only takes a few repetitions of the same line for it to
Heads Up Display. The game’s interface that provides
become annoying and instantly destroy the illusion of
feedback on player and game status.
the game world, because people never repeat themselves 5 THE PITFALLS OF CINEMA ENVY
exactly the same way twice. The game industry (and its audio component in
particular) has operated for quite a while under the drive
Story dialog is usually not a problem since story of imitating movies. We, as an industry, longed for the
elements are traditionally not repeated. One notable opportunity of creating “interactive movies” and the
exception is dialog delivered by NPC3s that must be technical capabilities to use similar development
engaged in conversation, usually to get a clue or other pipelines. Many past game conferences featured talks on
information. You can go back to talk to them multiple how the get the “cinematic feel” in games, and audio,
times, but in these cases it is akin to a replay feature that especially music, has been striving to create sounds and
acts as a memory aide for the player, so the repetition is pieces that were of “cinema quality”.
acceptable.
As we reached that goal, we started to realise that games
The most problematic type of dialog is what is often were indeed their own entities and that while we could
called barks, or punctual exclamations. We include in definitely borrow from a century of movie making
this category all non-story dialogs, exclamations and expertise to improve our methods and raise quality
grunts. Once again, economical considerations come levels, games needed to develop their own artistic
into play, often limiting the number of alternate lines sensibilities. Instead of wishing for the “cinema sound”,
that are reasonably feasible in a game we should work on defining a “game sound” aesthetic.
Early 8 bit games had their own sound, grown out of
necessity because of the technological constraints the
4 BENEFITS OF LOWERING AUDIO consoles had. Now that our palette is considerably
REPETITION larger, we, as an industry, need to establish our own
In his book, The Fat Man on Game Audio, George aesthetics, of which cinema sound will be one facet.
Sanger postulates that repetition in children’s games is
especially hurtful since the parents (who, as the buyers Repetition is not much of a concern for film audio
of the games, are the real clients) only experience the because of its nature. When applying their techniques to
game through its audio. game audio, we must be careful to adapt to its own
paradigm.
“...everything the buyer experiences, all of the
motivation to buy the next product comes from the
audio. The parents do not see or play the game. They 6 PROFILING THE GAME
hear it.” To properly tackle the problem of audio repetition in a
(Sanger, 2003) [4] game, it is important to understand where it is most
obvious. While there are many ways to tackle the issue,
It is often said that when audio is done right, no one not all of them are appropriate or effective in the context
notices, but when something is out of place everyone of every game. A video game is by its very nature
notices. While this is somewhat of an exaggeration, it repetitive; certain actions make up the core of the game
underlines the reality of people’s perception of audio: play and are repeated throughout the entire game.
most people now expect game audio to be of a high
quality. Anything that takes us out of the immersion is One technological feature we have implemented at
immediately perceived as negative. While the average A2M has helped us a lot on this issue. We built a
player will probably not pick up on audio issues like profiler that, during gameplay, logs every “play sound”
slight distortion, phasing or over compression, he or she event, and creates statistics on the use of every sound
will definitely perceive sounds that are repeated too event. So an audio designer can play the game for half
often. an hour, then look at the report to see which sounds
have been played the most often. He can then
Too much repetition in audio can also encourage the concentrate his effort on reducing the repetition in the
player to lower the sound or turn it off altogether, events that are played the most often. Unfortunately, a
perhaps to replace with his or her own external source sound can play 10 times in 30 seconds, and not show up
of music. Being of the opinion that audio represents a very high on the list over half an hour of gameplay, so
significant portion of a game’s experience; I think we one must be careful to catch these types of events too.
risk greatly diminishing the effectiveness of the game’s
impact and immersion if we offer the player motivation To address this, further refinements on this tool would
to turn off audio. allow us to see the timing of play events over time, so
that it could generate a graph that shows the
3 concentration of play events during the game session.
Non Player Character
This would let the audio designer properly evaluate aware that the game has a limited pool of possibilities
which events run the risk of being perceived as and simply cycles through them.
repetitive. It should also be correlated against the length
of the sound asset, since longer sounds will presumably
be more recognizable. Ideally, these factors could be 7.1 Spotting the game
weighted by an established metric that would rank the One of the most useful approaches to non repetitive
sounds based on a “repeatability factor”. sound design is to spot the game. This technique is
commonly used by film composers to identify which
Since having a high number of variations relies on the sections of the movie need music. In games, we usually
amount of available memory, one of the first things to start by identifying repeated events we can systemize,
do is to make sure we are not wasting space. A routine which of course leads to repetition. Instead we need to
that optimizes the sampling rate of a sound according to identify key events that need to be prioritized. These
frequency content could be added to the export pipeline events need to be sound designed and scored according
to automate the work. You could even leverage the to their dramatic arc the game’s progression and their
profiler reports to allow frequent sounds to use a better importance in the overall game experience.
sampling rate than infrequent sounds by applying an
offset to the computed value. For example, encountering a specific type of monster
surely rates a one line comment from the hero, but
7 ART VERSUS TECHNOLOGY should every encounter receive the same sound event?
Despite this being a technical paper, it would be a huge No, the first one should be the most important, and then
omission not to talk about artistry. Game audio you could go down from there and make the hero less
designers put a lot of energy in coming up with complex impressed by the occurrence. After a few, the event
systems to manage the non-linearity of an interactive becomes trivial and is not commented upon. You could
medium, but sometimes it is the simple solutions that also inject some humour and have the hero start
are the most effective. The art of sound design still commenting again on how these creatures seem to be
needs to be the driving force behind all these popping up everywhere. Treating events according to
technological innovations. Not every solution works for their importance is much more important than simply
every situation of every game; they are part of a tool set assigning audio arbitrarily to it.
that need to be used at the right moment.
7.2 Data structure
Our first instinct, especially on productions that have The data structure the audio engine uses has to be
shorter development schedules and smaller teams, is to organized in a way that will let the audio designers be as
assign audio to events that are repeated often. That way, flexible as possible. For the last few versions of our
audio propagates quickly through the game and we can engine, we have settled on a 3 tiered structure that
then address the more unique events. The flip side is allows re-use (to save memory) but still allows for as
that by doing this, we are tying in audio early on to many custom tweaks as possible.
events or animations that could be repeated quite often.
Our audio data is split into three components: The
Most modern audio engines offer extensive Sound Event, the Playback data and the source data.
randomization capabilities that are often used to make The latter two can be re-used with different parents.
the audio less repetitive. But the effectiveness of such a
method is limited: varying pitch and volume can offer
some relief, but in a limited way. Varying the pitch of a
Sound Sound Sound Source
sound has its limits: past a certain amount, the sound Event Playback Source File
loses its natural timbre and stops resembling its original
form. Randomly varying the sound’s volume is also not
going to save a sound that repeats too often.
Likewise, adding more variations of a sound certainly The sound event contains all identification information,
helps make things appear more organic, but since these type, 3D range, etc. Unique sound events are created for
are called randomly, they have to be rather similar. In a every game event that requires a sound. That sound
randomized selection, any sound that is radically event will be a parent to possibly multiple playback
different will call attention to itself if there is no objects (or use nested sound events). The playback
apparent reason for the difference. Variation must come object contains pitch, pan and ADSR parameters. Each
in an organic way; when it doesn’t the player is too well playback object contains a single source object that
contains info on looping, sampling rate, streaming
status, etc, and links to an actual audio file. (This
flexibility could also be achieved through data sounds used tend to be similar, because any sample that
inheritance). is too different risks sticking out in a random sequence,
so the payoffs of this method are not as big as we could
So with this data structure, it is possible to re-use a expect.
source file on multiple events using customized
parameters, creating variations in behaviour, but not Some audio designers will disassemble their footstep
duplicating data in memory. This will allow an audio sounds into HEEL and TOE components, making it
designer full flexibility in customizing sound events, possible to randomize each component, and to mix and
and if during production a specific sound starts to get match from different original samples, therefore
repetitive, it is also possible to simply swap out the creating a quasi unlimited number of possibilities. This
source file component without having to redo any yields interesting results, but it is rather cumbersome
implementation. and is still disconnected with the actions of the
character.
8 AVOIDING REPETITION IN SOUND
EFFECTS The ideal system would combine these methods, but
Many type of sound effects present different challenges link them to the weight and speed of the character. By
with regards to repetition. In this section, I will examine varying the attack rate of the heel component according
some of them and possible ways of addressing them. to the speed of the character, a simple stationary shuffle
would have practically no audible heel component, a
8.1 Footsteps walk would have a soft one, and a full on run would
have a hard heel. You could even modulate the pitch
Footsteps are omnipresent in a sizeable portion of the according to the same parameters. Add a little
games on the market today. In this light, if the footstep randomization to pitch, volume and attack rate, and you
sounds are too repetitive, there is already a negative have an organic footstep system that is never the same
impact on the player’s perception. In his keynote at the but is tied to the behaviour of the character. The delay
2008 Austin Game Developers Conference, Jason Page between heel and toe can also be adjusted to simulate
(Audio Manager, Sony Computer Entertainment Europe slower or faster steps. By mixing the sounds so they are
R&D) even went so far as to identify footstep sounds as not too “in your face” and linking them to the
one of the major problems for game audio. character’s behaviour, you’ve greatly improved one of
the major annoyances of game audio.
Many games make the sound of the footsteps too
upfront. In real life, our brains mostly tune out these
sounds because they are unimportant and provide very
8.2 “Large scale granular synthesis”
minimal information. This filtering seems to be more
difficult to achieve when playing games, probably The design concept laid out in the previous section can
because these sounds often fail to achieve a degree of be described as a sort of really basic granular synthesis
realism that matches our brain’s expectations. The with very large grains. If we use this principle to drive
recordings themselves are faithful, but their use is not. If our artistic approach, we can further minimize audio
they are played too loud, are too similar and/or do not repetition. Game audio already uses this approach at a
react correctly to the actions of the character, they stick high level by providing individual sounds for each event
out as artificial. in the game, but we should go one step further and
deconstruct more complex sounds into individual layers.
If you listen to a person walking, on most surfaces you While this will increase the memory usage (and as such,
do not hear the majority of their steps. Sometimes a sounds that will be created this way must be carefully
slight stumble or hesitation will cause one step to be chosen), the level of variation attained can make it
harder than it should, but unless the person is walking in worth the sacrifice.
a noisy surface like gravel, walking sounds are nothing
like what the game industry depicts. Our insistence on For example, a game that features exploding objects as
clearly audible grass step sounds is destroying the part of its core gameplay will sound repetitive quickly.
illusion of reality. We can use many alternate samples, but, being that
these types of sounds are usually fairly long, the amount
Footstep repetition has often by addressed by providing of RAM consumed will be significant. Now using this
multiple variations of every sample, creating different method, we can slice apart a few explosion samples to
samples for left and right feet and randomizing volume split them into components based on frequency and
and pitch. Coupled with the need for multiple surface function. So we isolate the initial explosive transient
types, these add up to a large amount of sound effects (the “bang”), the deep bass rumble, the body, and the
that need to be managed and carried in memory. The high end crackle. By re-assembling these at run time
the motion of the string and the interaction between or a story that is simply a clothesline upon which we pin
them when they are in contact. The consequence of this the various action moments. In the same vein, voice
is that if you hit the string when it is already in motion direction is crucial in getting the best and most
rather than at rest, you will have a different resulting appropriate performance out of an actor. These
sound due to the initial condition of the system which is disciplines have their own specialists who should be
different each time. This is the sort of advantage involved; the audio team’s input should be limited to
provided by physical modeling over sample based making sure the technical aspects are up to their
methods. The sound reflects real acoustic behavior and standards.
changes according to the control parameters received,
naturally reproducing timber changes versus loudness, Implementation, on the other hand, is an area where the
vibratos, rich and changing transients, and so on. This audio team should work in close collaboration with the
makes the result sounding more realistic and provides story team. I’ve seen too many games where, by
the performer or listener with a more lively experience. working separately, the audio team ends up with huge
“ [5] amount of dialog, with no clear sense of their intended
use. So the audio team simply plugs lines where there
The beauty of these modeling instruments is that their are holes, and hopes there are not too many unused lines
size is very modest, especially when compared with at the end.
sample based libraries. The drawback is that they are
still quite CPU intensive despite being fairly optimized. This method systematically leads to unwanted repetition
in dialog. The easiest way to implement a dialog event
Since every element that plays a part in sound creation in a game is to attach it to an event. But as we’ve seen
is modeled, you can tweak the qualities of the so far, most game events are often repeated during the
instrument quasi infinitely by playing with factors like course of gameplay, and despite having multiple
string tension, hammer density, wood type, etc. variations, the dialog will repeat much too often.
Now let’s imagine applying these techniques to impact This is where the story specialist can help out the audio
sounds in a game. If you could model (and that is a big team. Instead of coming up with simply multiple
if considering the R&D involved) most types of material versions of a bark, we should make sure the alternates
and how they react in a collision, you could then feed to can be used to highlight a progression in the character’s
the system information about their mass and velocity life. Let’s look at a fictional (and slightly exaggerated)
(and eventually point of impact) and get, in real-time, a example. Let’s suppose your main character is supposed
sound that is totally realistic yet completely unique and to say a line when encountering monster A. The way
appropriate to the situation. While we wait for CPU many games are written, the script will contain lines like
power to catch up, it would surely be feasible to these for this event:
simplify the algorithms and still get acceptable results.
1. “What is that?”
2. “I’m in for a fight!”
9 AVOIDING REPETITION IN DIALOG 3. “You are going down!”
4. “How am I going to get out of this?”
Dialog presents an interesting situation in my view.
Traditionally, it is the audio team’s job to process,
These 4 lines would be randomly selected every time
manage and implement dialog in a video game. Game
you encounter said monster. But does it make sense for
reviewers also lump dialog into the SOUND category.
the 3rd line to play the first time you have that
But is dialog really in the audio team’s domain?
encounter? Of course not, but it will happen to some
Reviews never mention the compression levels, EQ,
players. Now, a better use of these four lines would be
deglitching or dialog mix that are the usual areas where
something like this:
the audio team impacts the quality of voice overs.
The first time you encounter the monster, you play line
Most comments directed towards dialog reflect three
1. It doesn’t make sense to use it again now that we are
ideas: Quality of the writing, quality of the acting (and
familiar with the beast, so we drop it. For the next 8-10
the implied voice direction) and quality of
encounters, you randomly select between lines 2 and 4,
implementation. I know that in many companies the
but maintaining a probability that nothing plays, so that
first two items fall under the audio department’s
out of those 10 encounters you get at the most 2-3 total
jurisdiction, but in my opinion this is wrong. We need
occurrences of the lines. Then you keep the character
people dedicated to storytelling, who use the script to
silent for a while, until he’s progressed enough that
move a storyline forward and develop characters,
these beasts are not as much of a challenge. Then you
instead of just providing one liners that react to events,
can use line 3 (and providing more alternates would of
course be desirable) every 2-3 encounters (providing data when the relevant state is loaded. This presupposes
there are not hundreds of them) to show that the that you can set the state early enough for the data to
character has grown and is more powerful. Not only load into RAM. In the context of using this to create a
have you sidestepped the repetition problem, but you’ve progression of story, we can safely assume that states
introduced an arc into your character’s storyline. will be set at intermittent intervals, and can usually be
foreseen well in advance.
9.1 Redistributed sound sets Another approach is to create a sound event that holds
One of the nicest tricks I’ve seen on the market is the no definite audio files. As you load pieces of the
real time redistribution of voice sets. The Halo franchise environment, you load what files you want for that
this to great effect to maintain an illusion of variety in particular section. It is quick and easy for the audio
the dialog of your squad members. Bungie Audio designer (you just need to set some parameters in the
Director Martin O’Donnelexplained the process as such: level data), but you are bound to the granularity of the
level sectioning and it becomes really difficult to react
“(…) we used 6 actors to record 4 PFC's and 2 to events, like character progression, that are not
Sergeants and had the resulting soundtags attached to environment related.
the corresponding marine models. We were also careful
to keep track of those who lived and those who died in
order to maintain the balance of the voice actors. If you 9.3 Using text to speech technology
start an encounter with 8 marines, two of the characters The previous section dealt with technology and best
would be doubled up, let's say two of Pvt. Bisenti and practices that are available (and used, although not often
two of Pvt. Mendoza. If at the end of the encounter only enough) currently in the game industry. If we look
4 marines remain, even if Bisenti or Mendoza never got forward, the most likely enhancement will come from
killed, we would swap one of the duplicates for a some sort of text-to-speech technology or voice font.
missing Jenkins or Johnson, in order to keep the
characters balanced.” Right now the biggest obstacle to using text-to-speech
O’Donnell 2002 [6] technology is the artificiality of the results. Some more
sophisticated engines allow the user to input special
While such a method can only work in a chaotic characters that dictate intonation and accentuation,
environment (you cannot notice that a character (Similar features have also been implemented in some
switches voice set or you’ll do more harm than good) music sample libraries like Voices of the Apocalypse
populated by more or less cookie cutter characters, it is and Quantum Leap Voices of Passion) but the amount
a great way to ensure that at least you use all your assets of work required to create a realistic performance is
to their fullest extent. This can even be used for sound quite high.
effects, like weapons when you have multiple enemies
attacking you at the same time. While I doubt we’ll be able to duplicate the
performance of a good actor for a long time (at least
9.2 Sound event states under circumstances that make sense for a game
To be able to select a line depending on the character’s developer), we can imagine using such technology for
progression, we need a sound event system that is minor characters (like squad members), or radio
slightly more advanced than what we use most of the communications (the lower quality could obscure the
time. The first consideration is whether or not the dialog artificiality). One way to keep the amount of work to a
files are streamed. If they are, then we can afford to minimum would be to create a pool of lines that sound
carry extra lines that will not be used at this point in the great, and then only expand by making small variations
game, but if they have to be loaded in RAM, then we in intonation, speech speed and vocabulary used. In this
definitely need a system that will only load what it way, we could quickly multiply the number of available
needs at this particular moment. lines without spending too much time recreating
performances from scratch.
The most elegant way is to have a sound event that can
support states. That way, the same sound event can be Those small variations can probably bring just as much
called throughout the game, but will pick from a break from variations as largely different lines. After all,
different sound list depending on its state. The game most people will form similar sentences to express the
logic can then switch the states according to the desired same thought multiple times, or at least cycle through a
parameters (character progression, game level, specific few familiar expressions.
event, etc). If the audio files are not streamed though,
we need to add a mechanic that will only load the audio
9.4 Dialog stitching dropped from languages that are incompatible with the
Dialog stitching is a technique that cuts up dialog to structure adopted.
reassemble it at runtime. Sports games use that
extensively to create the running commentary. For
example, announcing the player showing up at the plate 10 AVOIDING REPETITION IN MUSIC
in a baseball game, the game would use the following Providing variation in music is probably one of the most
elements: talked about subjects in the field of videogame audio.
Adaptive music (also known as interactive music) has
“At bat,” + team + position + “number” + number + been used for a long time, under many guises, and I do
name. not feel it is the scope of this paper to simply rehash
some of them.
The result would sound like:
But is dynamic music all it could ever be? Most of the
“At bat, Mets catcher, number 38, Bob Smith!” challenges are artistic, and not technical. It is not easy to
maintain coherence in a music track that will be
Usually, dialog stitching is used to lower the amount of modified or re-assembled on the fly. The more
dialog to record, but if we accept the premise that ambitious projects (like Spore) have avoided the
people tend to repeat things they say in a similar way, traditional reliance on melody and thematic material to
then dialog stitching becomes another tool to fight go in a more abstract direction. Others have subdivided
repetition. pieces into their basic structural components (intro,
loop, coda, transition, etc) to be re-assembled on the fly
A simple structure could use a prefix, line and suffix, during gameplay, in what is more dynamic arrangement
with the first and last being optional. If you record the than dynamic composition (which is definitely what
following lines: Spore does)
fly during gameplay. (And despite my examples being generic. I’m sure that the process can be improved upon
based on orchestral material, this principle can be and it might become a viable method of generating
applied to any genre of music. Replace leitmotif by riff melodic content on the fly, but for now it would be
and orchestration by arrangement, and you have the difficult to consider using such method to create stirring
recipe for a rock score.) themes.
The challenges of procedural thematic music are Beyond that, software like the upcoming version 2 of
twofold: how do you make sure you have enough Melodyne with Direct Note Access offers much promise
material to cover all the desired situations, and how can for dynamic scoring. Promising the ability to time and
you manipulate that data to adapt it to a new context. pitch shift individual notes in a polyphonic recording,
such technology could allow further manipulation of
The first solution to this challenge is sheer manpower: music data at runtime. We could re-harmonize cues,
have your composer(s) write endless variations of the modify timing to create smooth tempo changes or even
material to play in any given situation. While this is change articulation of lines. Chords could become
certainly feasible, the costs involved are very high, slightly arpeggiated or switch from minor to major
ensuring that only the highest profile games can do this. without affecting the sound quality.
If we turn again to technology to solve the artistic issues Of course we are certainly quite some time away from
of dynamic music, there are a few methods that can be having hardware that could do this in real-time, but with
appropriate. the exponential increments in computing power, maybe
it will be feasible sooner than later.
The first is MIDI technology. It was used to great effect
with LucasArts’s iMuse engine in the 90’s, but as pre-
rendered audio supplanted MIDI, such technique fell out 10.5 The human factor
of use. The great advantage of MIDI is the flexibility to It is difficult to explore dynamic music systems without
manipulate the data, but traditionally the biggest at some point pondering the impact of the human factor
drawback was the low quality of the samples available. in music. We explore ways of automatically generating
Sampling technology has greatly improved in the last music, but is it the human touch that makes music so
decade, but the amount of RAM and memory necessary compelling for so many people? Can we encode that
to create a cutting edge rendering is still way beyond factor into an algorithm?
what modern game consoles can provide. Can we
imagine a future generation holding enough power to To me, dynamic music has two major goals: to lessen
store gigabytes of samples? Of course, but the amount repetition and to closely react to game events. It is
used by the music would have to be a negligible portion perhaps slightly ironic to try to lessen repetition,
of the whole. Increases in power and RAM are almost considering that most western music relies on such
always given to graphics first. repetition to establish a hook. But maybe in this lies the
philosophy that makes a good dynamic music system: it
Software like Band-in-a-Box, with their recent Real is not about creating new material, but re-using existing
Tracks enhancement, closely resembles the type of material in an organic way. We do not always need to
realtime orchestrator a game would need. Using make big changes to the music to avoid the feeling of
predefined “styles” (basically a set of rules that define repetition; we just need to use it efficiently.
how to react to chord progressions), the software plays
back, at the correct tempo and key, pre-recorded Can an automated system make music that is as good as
performances. Since the only user input is the chord music made by a human being? After all, most forms of
progression, the resulting song can be easily and quickly music are highly systemized, if not by intent, at least by
manipulated. Also, by having alternate rules, every the expectations of the style. We could program a
rendering of the song is a little different from the last machine to come up with a pretty good variation of the
one. If one were to apply such technology to a video 12 bar blues, but would it have THE blues? Our reaction
game, the chord progression itself could be manipulated to music is still a mystery; if it were not, we could
by the game engine, and styles swapped in and out to predict what the hits will be next week. (The recording
dynamically create a piece that could retain much industry would love that).
thematic content, as well as modulate, change tempo
and re-organize its own structure. Does that mean that we should not be concerned by
repetition in game music? No, we should be very
Band-in-a-box also offers means to dynamically create concerned, but we must be careful not to lose sight of
melodies, once again using pre-defined rules. My few what makes great game music: the raw emotional
experiments with it were mildly interesting but rather
11 CONCLUSIONS
Repetition in game audio is a problem that has been
present from the moment a game first made a sound.
The reasons for this are mostly technical (memory
issues, sameness of gameplay) or financial (budget does
not allow for the creation and implementation of enough
raw assets), but much of the solutions lies in the
creativity of the audio designers and how they use the
tools they have available.
12 REFERENCES
[1] G.A. Sanger, The Fat Man on Game Audio:
Tasty Morsels of Sonic Goodness, New Riders
Publishing (2003). P 213-221
[2] http://www.gamasutra.com/resource_guide/2002
0520/odonnell_01.htm
[5] http://www.applied-acoustics.com/techtalk1.htm
[6] http://www.gamasutra.com/resource_guide/2002
0520/odonnell_02.htm
[7] http://www.trell.org/wagner/motifs.html#leitmoti
fs